GoekeLab / xpore

Identification of differential RNA modifications from nanopore direct RNA sequencing
https://xpore.readthedocs.io/
MIT License
132 stars 22 forks source link

xpore v2.0 dataprep ERROR #72

Closed DrDaedalusWHU closed 9 months ago

DrDaedalusWHU commented 3 years ago

Hi developer, I'm trying to using the practice demo data downloaded from https://zenodo.org/record/5103099/files/demo.tar.gz, and I have successfully installed the latest version of xpore version 2.0. When I came to the step of dataprep, I kept getting the same ERROR and could not proceed, it seemed that something wrong happened to the index.

My working directory: ~/bioinfo/xpore_practice/demo/data/HEK293T-WT-rep1

The running command:

xpore dataprep \
--eventalign nanopolish/eventalign.txt \
--gtf_path_or_url demo.gtf \
--transcript_fasta_paths_or_urls demo.fa \
--out_dir dataprep \
--genome

And then came this ERROR: /home/huangkx/miniconda3/lib/python3.9/site-packages/pandas/core/frame.py:3607: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy self._set_item(key, value) /home/huangkx/miniconda3/lib/python3.9/site-packages/xpore/scripts/dataprep.py:21: PerformanceWarning: indexing past lexsort depth may impact performance. pos_end += eventalign_result.loc[index]['line_length'].sum() Traceback (most recent call last): File "/home/huangkx/miniconda3/bin/xpore", line 33, in sys.exit(load_entry_point('xpore==2.0', 'console_scripts', 'xpore')()) File "/home/huangkx/miniconda3/lib/python3.9/site-packages/xpore/scripts/xpore.py", line 67, in main options.func(options) File "/home/huangkx/miniconda3/lib/python3.9/site-packages/xpore/scripts/dataprep.py", line 688, in dataprep merge_transcript_id_version = check_gene_tx_id_version(gtf_path_or_url) File "/home/huangkx/miniconda3/lib/python3.9/site-packages/xpore/scripts/dataprep.py", line 635, in check_gene_tx_id_version if ln[2] == "transcript" or ln[2] == "exon": IndexError: list index out of range

It would be grateful if you could give me some advice on solving this problem. Thanks for developing this nice tool.

yuukiiwa commented 3 years ago

Hi @DrDaedalusWHU,

Thank you so much for leaving an issue. The demo.gtf and demo.fa from zenodo's demo.tar.gz are in html format, which @ploy-np will update those files on zenodo soon. Here are the links to the demo.fa and demo.gtf files.

Best, Yuk Kei

DrDaedalusWHU commented 3 years ago

Thanks for your kind reply! I will try the newly uploaded demo files.

ploy-np commented 2 years ago

Hi @DrDaedalusWHU,

FYI, the files are all up-to-date. Perhaps try using wget. Let me know if you still have any problem.

al3xMlt commented 2 years ago

got the same Problem: Using only --eventalign and --out_dir, showing following error during dataprep (already processed couple of transcripts):

/home/amalt/miniconda3/envs/xpore/lib/python3.8/site-packages/xpore/scripts/dataprep.py:21: PerformanceWarning: indexing past lexsort depth may impact performance. pos_end += eventalign_result.loc[index]['line_length'].sum() /home/amalt/miniconda3/envs/xpore/lib/python3.8/site-packages/xpore/scripts/dataprep.py:72: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy chunk_split['line_length'] = np.array(lines)

DrDaedalusWHU commented 2 years ago

Hi @DrDaedalusWHU,

Thank you so much for leaving an issue. The demo.gtf and demo.fa from zenodo's demo.tar.gz are in html format, which @ploy-np will update those files on zenodo soon. Here are the links to the demo.fa and demo.gtf files.

Best, Yuk Kei

The new data you provided worked out smoothly. Thanks!

ploy-np commented 2 years ago

Hi @DrDaedalusWHU,

I have updated the demo.tar.gz file already. Could you please try again? Note that the download link is changed to https://zenodo.org/record/5162402/files/demo.tar.gz

Wardale24 commented 2 years ago

Hello,

I wanted to add to this issue instead of creating a new one. I first got the same error as the creator of the issue had. Then I updated the demo.gtf and demo.fa files as stated by @yuukiiwa but continued to get the same error. I then proceeded to wget https://zenodo.org/record/5162402/files/demo.tar.gz as recommended by @ploy-np and got a slightly different error from one previously mentioned.

`xpore dataprep --eventalign demo/data/HEK293T-WT-rep1/nanopolish/eventalign.txt --gtf_path_or_url demo/demo.gtf --transcript_fasta_paths_or_urls demo/demo.fa --out_dir dataprep_demo2 --genome

/home/alex/.local/lib/python3.8/site-packages/xpore/scripts/dataprep.py:72: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy chunk_split['line_length'] = np.array(lines) /home/alex/.local/lib/python3.8/site-packages/xpore/scripts/dataprep.py:21: PerformanceWarning: indexing past lexsort depth may impact performance. pos_end += eventalign_result.loc[index]['line_length'].sum()`

Has this issue been resolved somehow?

Cheers!

yuukiiwa commented 2 years ago

Hi @Wardale24,

/home/alex/.local/lib/python3.8/site-packages/xpore/scripts/dataprep.py:72: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
chunk_split['line_length'] = np.array(lines)
/home/alex/.local/lib/python3.8/site-packages/xpore/scripts/dataprep.py:21: PerformanceWarning: indexing past lexsort depth may impact performance.
pos_end += eventalign_result.loc[index]['line_length'].sum()`

This message is a warning message not an error message, and you should be able to see the xpore dataprep outputs not being empty in dataprep_demo2. We get this message everytime running xpore dataprep. Thanks!

Best wishes, Yuk Kei

Wardale24 commented 2 years ago

This is true @yuukiiwa I am very sorry for adding this.

Thank you for the response.

yuukiiwa commented 2 years ago

Not a problem at all, @Wardale24 !