ERROR when running `scape infer_pa`

wangzhenzZ commented 4 months ago

There is only one pkl file in the pkl_input in the tutorial SCAPE-toy-example.ipynb, but there are 112 pkl files in my data after running scape prepare_input. Is it normal? And how can I perform the next step scape infer_pa? I tried the following codes:

for filename in ${output_dir}/pkl_input/*.input.pkl
  do
    scape infer_pa \
    --pkl_input_file $filename \
    --output_dir $output_dir
  done

And I got the error:

start each UTR region
Traceback (most recent call last):
  File "/miniconda3/envs/scape_env/bin/scape", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/miniconda3/envs/scape_env/lib/python3.11/site-packages/scape/__init__.py", line 5, in main
    cli(prog_name="scape")
  File "/miniconda3/envs/scape_env/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/miniconda3/envs/scape_env/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/miniconda3/envs/scape_env/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/miniconda3/envs/scape_env/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/miniconda3/envs/scape_env/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/miniconda3/envs/scape_env/lib/python3.11/site-packages/scape/apa_core.py", line 91, in infer_pa
    _infer_pa(pkl_input_file, output_dir, **para_dict)
  File "/miniconda3/envs/scape_env/lib/python3.11/site-packages/scape/apa_core.py", line 134, in _infer_pa
    infer(pkl_input_file, out_pkl_file, **kwargs)
  File "/miniconda3/envs/scape_env/lib/python3.11/site-packages/scape/apa_core.py", line 1116, in infer
    res = subsample_run(**args)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/miniconda3/envs/scape_env/lib/python3.11/site-packages/scape/apa_core.py", line 1008, in subsample_run
    apamodel = ApaModel(**kwargs)
               ^^^^^^^^^^^^^^^^^^
  File "/miniconda3/envs/scape_env/lib/python3.11/site-packages/scape/apa_core.py", line 358, in __init__
    bin_data(data, x_step=5, l_step=10, r_step=10, pa_step=5)
  File "/miniconda3/envs/scape_env/lib/python3.11/site-packages/scape/apa_core.py", line 311, in bin_data
    new_x_arr = np.bincount(idx_arr, x_arr) / cnt_arr
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: object too deep for desired array

I've also attached my current environment: package-list.txt

Any suggestions will be appreciated.

ThuyTien1 commented 4 months ago

@wangzhenzZ it's normal to have more than 1 file in pkl_input. It's because there are more than 100 utr region with available data from your BAM file. Can you provide the file that caused the error? If not, It seems that the error happened to the first object in the pickle file, so can you provide the overview of the first object by running the following command

import python

a = pickle.load("your_file_name.pkl")
print(a)

wangzhenzZ commented 4 months ago

This is the file.

import pickle

with open('possorted_genome_bam.100.112.100.input.pkl', 'rb') as file:
    a = pickle.load(file)

print(a)

('X:ENSSSCG00000012271:1:41803135-41803953:+',         x    l   r     pa    cb_id  read_id  junction     seg1_en     seg2_en
0      22  120 NaN    NaN  1052804        0         1  41803214.0  41803341.0
1      26  122 NaN    NaN   443991        1         1  41803214.0  41803349.0
2       2  150 NaN    NaN   492860        2         1  41803214.0  41803381.0
3       2  116 NaN    NaN   325148        3         1  41803214.0  41803347.0
4      36  120 NaN    NaN   502991        4         1  41803214.0  41803355.0
...   ...  ...  ..    ...      ...      ...       ...         ...         ...
1545  475   35 NaN  509.0   591034     1545         0         NaN         NaN
1546  476   43 NaN  518.0    20052     1546         0         NaN         NaN
1547  480   39 NaN  518.0   685085     1547         0         NaN         NaN
1548  484   43 NaN  526.0   841147     1548         0         NaN         NaN
1549  516   33 NaN  548.0   167917     1549         0         NaN         NaN

[1550 rows x 9 columns])

ThuyTien1 commented 4 months ago

@wangzhenzZ It is possible for you to write this data to a separate pickle file and attach here so that I can do the debugging?

wangzhenzZ commented 4 months ago

I noticed that this Issues board does not support uploading files in the pickle format. To facilitate the sharing of the necessary files, could you please provide me with your email address where I can send the pickle file?

ThuyTien1 commented 4 months ago

@wangzhenzZ Oh you can attach the dataframe only as TSV or CSV file here.

wangzhenzZ commented 4 months ago

@wangzhenzZ Oh you can attach the dataframe only as TSV or CSV file here.

test_input.csv

chengl7 commented 3 months ago

@ThuyTien1 , do you have time to look into this issue?

chengl7 commented 1 month ago

@wangzhenzZ , i have just updated the package. Could you try reinstall the package and run your analysis again and see if the problems is solved?

Dongxu-Zheng commented 1 month ago

@wangzhenzZ , i have just updated the package. Could you try reinstall the package and run your analysis again and see if the problems is solved?

Hi I also had the same problem. There is no output from infer_pa. I noticed this issue so I installed scape using conda. I am not sure if I installed the latest version of scape. But it doesn't work for me by now.

Best, Dongxu

chengl7 commented 5 days ago

Hi Dongxu,

@wangzhenzZ , i have just updated the package. Could you try reinstall the package and run your analysis again and see if the problems is solved?

Hi I also had the same problem. There is no output from infer_pa. I noticed this issue so I installed scape using conda. I am not sure if I installed the latest version of scape. But it doesn't work for me by now.

Best, Dongxu

@Dongxu-Zheng , I just tried it. It seems to be working. Here are the commands I use.

git clone https://github.com/chengl7-lab/scape.git
conda remove -n scape_env --all
conda env create -f mac_environment.yml
conda activate scape_env
cd ./scape/examples/toy-example
# here we use "test" as the output directory, takes 6 mins in my laptop
scape prepare_input --utr_file ./GRCh38_98.csv --cb_file ./barcodes.tsv.gz --bam_file ./example.bam --output_dir ./test --chunksize 100
# infer pA sites
scape infer_pa --pkl_input_file ./test/pkl_input/example.100.1.1.input.pkl --output_dir ./test
# remove spurious pA sites, the "--utr_merge" should be set to False if you are only interested in one UTR
scape merge_pa --output_dir ./test --utr_merge True
# Extract the pA counts, the result is in ./test/res.gene.cnt.tsv.gz
scape ex_pa_cnt_mat  --output_dir ./test --res_pkl_file res.gene.pkl

Could you try the steps above and see if it works? If it dose not work, could you share your commands and error messages ?

best regards, Lu

chengl7-lab / scape

ERROR when running `scape infer_pa` #4