GoekeLab / m6anet

Detection of m6A from direct RNA-Seq data
MIT License
104 stars 19 forks source link

m6anet-inference error #121

Open XichenZhao0223 opened 1 year ago

XichenZhao0223 commented 1 year ago

Dear developers, Thank you for developing this tool. I have encountered two errors when running the inference step for two data.

The version I use is 2.0.2.

I have run the codes as follows for these two data:

find path/to/fastq/ -maxdepth 1 -name "*.fastq" | xargs cat > reads.fastq

seqtk seq -A reads.fastq > reads.fasta

nanopolish index -d /path/to/single/fast5 reads.fasta -s /path/to/sequencing_summary.txt

minimap2 -ax map-ont -t 6 hg19_UCSC_knownGene.fasta reads.fasta | samtools sort -T *.tmp -o reads.sorted.bam

samtools index reads.sorted.bam

nanopolish eventalign --reads reads.fasta --bam reads.sorted.bam --genome hg19_UCSC_knownGene.fasta --scale-events --signal-index --summary each_read_alignment_summary.txt --threads 10 > eventalign.txt

m6anet dataprep --eventalign eventalign.txt --out_dir ./dataprep --n_processes 4

m6anet inference --input_dir ./dataprep --out_dir ./prediction  --n_processes 4 --num_iterations 1000

The data preparation fold has 4 files: data.info data.json data.log eventalign.index

For the first error: Error message:

 Traceback (most recent call last):
  File "/home/Xichen.Zhao/.local/bin/m6anet", line 8, in <module>
  File "/home/Xichen.Zhao/.local/lib/python3.8/site-packages/m6anet/__init__.py", line 30, in main
  File "/home/Xichen.Zhao/.local/lib/python3.8/site-packages/m6anet/scripts/inference.py", line 86, in main
    ds = NanopolishDS(input_dir[0], DEFAULT_MIN_READS, args.norm_path, mode='Inference')
  File "/home/Xichen.Zhao/.local/lib/python3.8/site-packages/m6anet/utils/data_utils.py", line 100, in __init__
  File "/home/Xichen.Zhao/.local/lib/python3.8/site-packages/m6anet/utils/data_utils.py", line 109, in set_feature_indices
    self.total_neighboring_features = self.get_total_neighboring_features()
  File "/home/Xichen.Zhao/.local/lib/python3.8/site-packages/m6anet/utils/data_utils.py", line 149, in get_total_neighboring_features
    kmer, _ = self._load_data(self.data_fpath, tx_id, tx_pos, start_pos, end_pos)
  File "/home/Xichen.Zhao/.local/lib/python3.8/site-packages/m6anet/utils/data_utils.py", line 185, in _load_data
    pos_info = json.loads(json_str)[tx_id][str(tx_pos)]
  KeyError: 22

The preview of data.info:


The preview of data.json:


For the second error: Error message:

Traceback (most recent call last):
  File "/home/Xichen.Zhao/.local/bin/m6anet", line 8, in <module>
  File "/home/Xichen.Zhao/.local/lib/python3.8/site-packages/m6anet/__init__.py", line 30, in main
  File "/home/Xichen.Zhao/.local/lib/python3.8/site-packages/m6anet/scripts/inference.py", line 86, in main
    ds = NanopolishDS(input_dir[0], DEFAULT_MIN_READS, args.norm_path, mode='Inference')
  File "/home/Xichen.Zhao/.local/lib/python3.8/site-packages/m6anet/utils/data_utils.py", line 100, in __init__
  File "/home/Xichen.Zhao/.local/lib/python3.8/site-packages/m6anet/utils/data_utils.py", line 109, in set_feature_indices
    self.total_neighboring_features = self.get_total_neighboring_features()
  File "/home/Xichen.Zhao/.local/lib/python3.8/site-packages/m6anet/utils/data_utils.py", line 149, in get_total_neighboring_features
    kmer, _ = self._load_data(self.data_fpath, tx_id, tx_pos, start_pos, end_pos)
  File "/home/Xichen.Zhao/.local/lib/python3.8/site-packages/m6anet/utils/data_utils.py", line 184, in _load_data
    json_str = f.read(end_pos - start_pos)
TypeError: argument should be integer or None, not 'float'

The preview of data.info:


The preview of data.json:


Could you please help me with these two issues? Thank you for your time, and I look forward to hearing from you.

chrishendra93 commented 1 year ago

hi @XichenZhao0223, at a glance, this seems to me that the error is caused by a mismatch of datatype between transcript_id in data.info and in data.json. I am suspecting that your transcript_id is not stored as string in data.info but is stored as string in data.json. An easy fix to this issue is to convert the transcript_id column in data.info manually to string. Alternatively, can you please send one of the data.info and data.json pair files for me to verify this? Thanks!

XichenZhao0223 commented 1 year ago

Dear developer,

Thank you for your patient reply. I have tried to convert the trans_id in data.info into characters in R language, but unfortunately, the error still exists. Below is the Dropbox link for the two files: https://www.dropbox.com/s/dpx6089nk3pu74c/Xichen0223.zip?dl=0 Although these are not the exact replicate data files mentioned above, it shows the same error in the inference step.

AndreaYCT commented 7 months ago

HI, @XichenZhao0223 @chrishendra93

I think I am having similar error. Is the problem solved?

Many thanks!


Dear developer,

Thank you for your patient reply. I have tried to convert the trans_id in data.info into characters in R language, but unfortunately, the error still exists. Below is the Dropbox link for the two files: https://www.dropbox.com/s/dpx6089nk3pu74c/Xichen0223.zip?dl=0 Although these are not the exact replicate data files mentioned above, it shows the same error in the inference step.

XichenZhao0223 commented 7 months ago

HI, @XichenZhao0223 @chrishendra93

I think I am having similar error. Is the problem solved?

Many thanks!


Dear developer, Thank you for your patient reply. I have tried to convert the trans_id in data.info into characters in R language, but unfortunately, the error still exists. Below is the Dropbox link for the two files: https://www.dropbox.com/s/dpx6089nk3pu74c/Xichen0223.zip?dl=0 Although these are not the exact replicate data files mentioned above, it shows the same error in the inference step.

Dear Andrea, Unfortunately, the issue has not been solved. May I ask about which version of the transcript reference file did you use in read alignment and nanopolish event alignment? I encountered this error when using UCSC transcript file (hg19_UCSC_knownGene.fasta). And the error message seemed to be the transcript id of UCSC transcripts. So I switched to the Ensemble transcript file (Homo_sapiens.GRCh38.cdna.all.fa) and the problem had been solved. Hope this information can help you with your problem.

Regards, Xichen

AndreaYCT commented 7 months ago

hi, @XichenZhao0223

thanks for quick reply. My reference is from gencode. I will switch to ESEMBEL and let you know the result.

your info is very appreciated!


HI, @XichenZhao0223 @chrishendra93 I think I am having similar error. Is the problem solved? Many thanks! Andrea

Dear developer, Thank you for your patient reply. I have tried to convert the trans_id in data.info into characters in R language, but unfortunately, the error still exists. Below is the Dropbox link for the two files: https://www.dropbox.com/s/dpx6089nk3pu74c/Xichen0223.zip?dl=0 Although these are not the exact replicate data files mentioned above, it shows the same error in the inference step.

Dear Andrea, Unfortunately, the issue has not been solved. May I ask about which version of the transcript reference file did you use in read alignment and nanopolish event alignment? I encountered this error when using UCSC transcript file (hg19_UCSC_knownGene.fasta). And the error message seemed to be the transcript id of UCSC transcripts. So I switched to the Ensemble transcript file (Homo_sapiens.GRCh38.cdna.all.fa) and the problem had been solved. Hope this information can help you with your problem.

Regards, Xichen

AndreaYCT commented 7 months ago

Hi, @XichenZhao0223,

I got it done with ENSEMBL reference!

Thank you again for your suggestion and experience!!!!!!!


HI, @XichenZhao0223 @chrishendra93 I think I am having similar error. Is the problem solved? Many thanks! Andrea

Dear developer, Thank you for your patient reply. I have tried to convert the trans_id in data.info into characters in R language, but unfortunately, the error still exists. Below is the Dropbox link for the two files: https://www.dropbox.com/s/dpx6089nk3pu74c/Xichen0223.zip?dl=0 Although these are not the exact replicate data files mentioned above, it shows the same error in the inference step.

Dear Andrea, Unfortunately, the issue has not been solved. May I ask about which version of the transcript reference file did you use in read alignment and nanopolish event alignment? I encountered this error when using UCSC transcript file (hg19_UCSC_knownGene.fasta). And the error message seemed to be the transcript id of UCSC transcripts. So I switched to the Ensemble transcript file (Homo_sapiens.GRCh38.cdna.all.fa) and the problem had been solved. Hope this information can help you with your problem.

Regards, Xichen

XichenZhao0223 commented 7 months ago

Hi, @XichenZhao0223,

I got it done with ENSEMBL reference!

Thank you again for your suggestion and experience!!!!!!!


HI, @XichenZhao0223 @chrishendra93 I think I am having similar error. Is the problem solved? Many thanks! Andrea

Dear developer, Thank you for your patient reply. I have tried to convert the trans_id in data.info into characters in R language, but unfortunately, the error still exists. Below is the Dropbox link for the two files: https://www.dropbox.com/s/dpx6089nk3pu74c/Xichen0223.zip?dl=0 Although these are not the exact replicate data files mentioned above, it shows the same error in the inference step.

Dear Andrea, Unfortunately, the issue has not been solved. May I ask about which version of the transcript reference file did you use in read alignment and nanopolish event alignment? I encountered this error when using UCSC transcript file (hg19_UCSC_knownGene.fasta). And the error message seemed to be the transcript id of UCSC transcripts. So I switched to the Ensemble transcript file (Homo_sapiens.GRCh38.cdna.all.fa) and the problem had been solved. Hope this information can help you with your problem. Regards, Xichen

Hi Andrea,

You're welcome, and congratulations !! 😃

Regards, Xichen