GoekeLab / xpore

Identification of differential RNA modifications from nanopore direct RNA sequencing
https://xpore.readthedocs.io/
MIT License
131 stars 23 forks source link

Errors while running xpore dataprep #110

Closed JBerthelier closed 8 months ago

JBerthelier commented 2 years ago

Dear GoekeLab,

I am trying to run xpore on the cluster of our institute, everythings goes well using the demo data, however I got this error/warning while running xpore dataprep with my own data, by chance do you have any ideas of the causes and how to fix it ?


Error. nthreads cannot be larger than environment variable "NUMEXPR_MAX_THREADS" (64)/home/mycomputer/.local/ lib/python3.7/site-packages/xpore-2.1-py3.7.egg/xpore/scripts/dataprep.py:21: PerformanceWarning: indexing past lexsort depth may impact performance. pos_end += eventalign_result.loc[index]['line_length'].sum() /home/mycomputer/.local/lib/python3.7/site-packages/xpore-2.1-py3.7.egg/xpore/scripts/dataprep.py:72: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead


Best regards,

Jeremy

yuukiiwa commented 2 years ago

Hi Jeremy,

Thanks for reaching out! It will be great if you can provide the command you used for running xpore dataprep. Other than that, you can also look into the following two things:

  1. After you see this error/warning, was xpore dataprep still generating the dataprep/data.json file (see whether it increases in size by ls -lh dataprep/data.json)? If yes, xpore dataprep is still running fine.
  2. What is the value you put in for --n_processes? Is this value larger than your environment variable "NUMEXPR_MAX_THREADS"? If yes, you might want to either change --n_processes to a smaller value or increase the value of your environment variable "NUMEXPR_MAX_THREADS"

Best wishes, Yuk Kei

erika-fukuhara commented 2 years ago

Hi Yuk Kei,

I’m working with Jeremy on running xpore dataprep.

Here is the command I used for running xpore data prep:

xpore dataprep \
--eventalign “eventalign_Araport11_GTF_genes_transposons-col0.txt" \
--gtf_or_gff “Araport11_GTF_genes_transposons_final_xpore.sorted.gtf" \
--transcript_fasta “Araport11_GTF_genes_transposons.fa" \
--out_dir dataprep \
--genome

After seeing the error/warning, xpore dataprep only generated the eventalign.index file. No other output files are generated when I try to run xpore dataprep.

Best, Erika

yuukiiwa commented 2 years ago

Hi Erika,

Thank you for the information! Do you mind showing me the head of eventalign_Araport11_GTF_genes_transposons-col0.txt, Araport11_GTF_genes_transposons_final_xpore.sorted.gtf, and Araport11_GTF_genes_transposons.fa, please? I am suspecting that this might be due to a customized gtf file.

Thanks!

Best wishes, Yuk Kei

erika-fukuhara commented 2 years ago

Hi Yuk Kei,

Here is the head for the eventalign.txt, GTF, and FASTA files.

eventalign_Araport11_GTF_genes_transposons-col0.txt:

contig  position    reference_kmer  read_index  strand  event_index event_level_mean    event_stdv  event_length    model_kmer  model_meamodel_stdv standardized_level  start_idx   end_idx
AT1G01020.2 426 TTCTG   29  t   429 78.67   1.821   0.00664 TTCTG   79.59   2.07    -0.36   29062   29082
AT1G01020.2 426 TTCTG   29  t   430 82.91   1.990   0.00332 TTCTG   79.59   2.07    1.32    29052   29062
AT1G01020.2 427 TCTGA   29  t   431 95.35   1.866   0.00232 TCTGA   91.37   2.85    1.15    29045   29052
AT1G01020.2 427 TCTGA   29  t   432 99.25   1.877   0.00631 TCTGA   91.37   2.85    2.27    29026   29045
AT1G01020.2 427 TCTGA   29  t   433 94.57   2.016   0.00266 TCTGA   91.37   2.85    0.92    29018   29026
AT1G01020.2 427 TCTGA   29  t   434 98.04   1.761   0.00797 TCTGA   91.37   2.85    1.92    28994   29018
AT1G01020.2 428 CTGAT   29  t   435 122.09  3.429   0.00730 CTGAT   111.64  4.49    1.91    28972   28994
AT1G01020.2 428 CTGAT   29  t   436 117.08  2.426   0.00299 CTGAT   111.64  4.49    0.99    28963   28972
AT1G01020.2 429 TGATT   29  t   437 136.43  6.966   0.00266 TGATT   127.73  5.10    1.40    28955   28963

Araport11_GTF_genes_transposons_final_xpore.sorted.gtf:

1   Araport11   transcript  3631    5899    .   +   .   gene_id "AT1G01010"; transcript_id "AT1G01010.1";
1   Araport11   exon    3631    3913    .   +   .   gene_id "AT1G01010"; transcript_id "AT1G01010.1";
1   Araport11   exon    3996    4276    .   +   .   gene_id "AT1G01010"; transcript_id "AT1G01010.1";
1   Araport11   exon    4486    4605    .   +   .   gene_id "AT1G01010"; transcript_id "AT1G01010.1";
1   Araport11   exon    4706    5095    .   +   .   gene_id "AT1G01010"; transcript_id "AT1G01010.1";
1   Araport11   exon    5174    5326    .   +   .   gene_id "AT1G01010"; transcript_id "AT1G01010.1";
1   Araport11   exon    5439    5899    .   +   .   gene_id "AT1G01010"; transcript_id "AT1G01010.1";
1   Araport11   exon    6788    7069    .   -   .   gene_id "AT1G01020"; transcript_id "AT1G01020.2";
1   Araport11   exon    6788    7069    .   -   .   gene_id "AT1G01020"; transcript_id "AT1G01020.6";
1   Araport11   exon    6788    7069    .   -   .   gene_id "AT1G01020"; transcript_id "AT1G01020.1";

Araport11_GTF_genes_transposons.fa:

>AT1G01010.1
AAATTATTAGATATACCAAACCAGAGAAAACAAATACATAATCGGAGAAATACAGATTACAGAGAGCGAG
AGAGATCGACGGCGAAGCTCTTTACCCGGAAACCATTGAAATCGGACGGTTTAGTGAAAATGGAGGATCA
AGTTGGGTTTGGGTTCCGTCCGAACGACGAGGAGCTCGTTGGTCACTATCTCCGTAACAAAATCGAAGGA
AACACTAGCCGCGACGTTGAAGTAGCCATCAGCGAGGTCAACATCTGTAGCTACGATCCTTGGAACTTGC
GCTTCCAGTCAAAGTACAAATCGAGAGATGCTATGTGGTACTTCTTCTCTCGTAGAGAAAACAACAAAGG
GAATCGACAGAGCAGGACAACGGTTTCTGGTAAATGGAAGCTTACCGGAGAATCTGTTGAGGTCAAGGAC
CAGTGGGGATTTTGTAGTGAGGGCTTTCGTGGTAAGATTGGTCATAAAAGGGTTTTGGTGTTCCTCGATG
GAAGATACCCTGACAAAACCAAATCTGATTGGGTTATCCACGAGTTCCACTACGACCTCTTACCAGAACA
TCAGAGGACATATGTCATCTGCAGACTTGAGTACAAGGGTGATGATGCGGACATTCTATCTGCTTATGCA

Thank you, Erika

yuukiiwa commented 2 years ago

Hi Erika,

Thank you for sharing the eventalign.txt, GTF, and FASTA files! Those should be compatible with xpore dataprep.

I think you should look into the first line of the error message Error. nthreads cannot be larger than environment variable "NUMEXPR_MAX_THREADS", which contacting the cluster maintainers of your institute will help.

Thanks!

Best wishes, Yuk Kei

jeffersmith commented 2 years ago

hello Yuk Kei

I am also trying to use xpore dataprep and Encountered the same problem,the dataprep/eventalign.index is generating, but data.json, data.index, data.log and data.readcount is empty, I have no idea about it and may I ask for your help? The command I running xpore dataprep is

 xpore dataprep \
--eventalign data/${file}/nanopolish/eventalign.txt \
--gtf_or_gff all.gtf \
 --transcript_fasta ref.fa \
--out_dir data/${file}/dataprep \
--genome

I got error

/mycomputer/miniconda3/lib/python3.9/site-packages/xpore-2.1-py3.9.egg/xpore/scripts/dataprep.py:21: PerformanceWarning: indexing past lexsort depth may impact performance.
  pos_end += eventalign_result.loc[index]['line_length'].sum()
/mycomputer/miniconda3/lib/python3.9/site-packages/xpore-2.1-py3.9.egg/xpore/scripts/dataprep.py:72: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

And my eventalign.txt, GTF, and FASTA all seem like @erika-fukuhara, do you solve this problem or have any suggestion?

Thank you! Jeffer

acarmas1 commented 2 years ago

Hey,

I'm having the same problem. I run xpore dataprep but the data.json data.log and other files are empty.

Do you know how we can fix it?