adamewing / tldr

Identify and annotate TE-mediated insertions in long-read sequence data
MIT License
40 stars 4 forks source link

[E::idx_find_and_load] Could not retrieve index file for '13' #4

Closed Coracollar closed 3 years ago

Coracollar commented 3 years ago

Hi, I have used tldr (I am interested in methylation analysis so I used the detail_output) and seems to have worked for the most part, but the message: "[E::idx_find_and_load] Could not retrieve index file for '13' " keeps repeating. Any idea what might be happening? Thanks, Cora

2020-11-04 13:04:53,241 te-ont started with command: /home/coracollar/anaconda3/envs/tldr/bin/tldr -b /data/gpfs/projects/punim1048/allbarcoded/Control/C3/C3.sorted.bam -e /data/gpfs/projects/punim1048/TEs/tldr/ref/teref.mouse .fa -r /data/gpfs/projects/punim1048/GRCm38_genome/genome.fa -n /data/gpfs/projects/punim1048/TEs/tldr/ref/nonref.collection.mm10.chr.bed.gz --color_consensus --detail_output 2020-11-04 13:04:53,241 output basename: C3.sorted 2020-11-04 13:07:24,434 writing clusters to C3.sorted/10.pickle 2020-11-04 13:09:51,236 writing clusters to C3.sorted/11.pickle 2020-11-04 13:12:10,352 writing clusters to C3.sorted/12.pickle 2020-11-04 13:14:34,461 writing clusters to C3.sorted/13.pickle 2020-11-04 13:17:03,112 writing clusters to C3.sorted/14.pickle 2020-11-04 13:19:08,453 writing clusters to C3.sorted/15.pickle 2020-11-04 13:21:06,983 writing clusters to C3.sorted/16.pickle 2020-11-04 13:23:12,015 writing clusters to C3.sorted/17.pickle 2020-11-04 13:25:02,695 writing clusters to C3.sorted/18.pickle 2020-11-04 13:26:13,077 writing clusters to C3.sorted/19.pickle 2020-11-04 13:30:21,517 writing clusters to C3.sorted/1.pickle 2020-11-04 13:34:41,072 writing clusters to C3.sorted/2.pickle 2020-11-04 13:37:53,024 writing clusters to C3.sorted/3.pickle 2020-11-04 13:41:09,641 writing clusters to C3.sorted/4.pickle 2020-11-04 13:44:14,498 writing clusters to C3.sorted/5.pickle 2020-11-04 13:47:22,443 writing clusters to C3.sorted/6.pickle 2020-11-04 13:49:55,378 writing clusters to C3.sorted/7.pickle 2020-11-04 13:52:12,211 writing clusters to C3.sorted/8.pickle 2020-11-04 14:00:48,804 writing clusters to C3.sorted/9.pickle 2020-11-04 14:00:53,495 writing clusters to C3.sorted/MT.pickle 2020-11-04 14:02:27,087 writing clusters to C3.sorted/X.pickle 2020-11-04 14:03:42,954 writing clusters to C3.sorted/Y.pickle 2020-11-04 14:03:45,377 loaded 35483 clusters from C3.sorted/10.pickle [E::idx_find_and_load] Could not retrieve index file for '13' [E::idx_find_and_load] Could not retrieve index file for '13' [E::idx_find_and_load] Could not retrieve index file for '13' [E::idx_find_and_load] Could not retrieve index file for '13' [E::idx_find_and_load] Could not retrieve index file for '13' [E::idx_find_and_load] Could not retrieve index file for '13'

[....]

[E::idx_find_and_load] Could not retrieve index file for '13' [E::idx_find_and_load] Could not retrieve index file for '13' 2020-11-04 14:08:15,620 finished C3.sorted/10.pickle. wrote 29 records to C3.sorted.table.txt 2020-11-04 14:08:17,814 loaded 32873 clusters from C3.sorted/11.pickle [E::idx_find_and_load] Could not retrieve index file for '13' [E::idx_find_and_load] Could not retrieve index file for '13' [E::idx_find_and_load] Could not retrieve index file for '13'

and so on...

adamewing commented 3 years ago

Hm, it's not obvious to me offhand, could you check whether the chromosome names in the .bam file, and the genome reference .fasta file use a 'chr' prefix or not?

Also, if your intention is to try out the methylation functions downstream of tldr, you'll probably want to use the --extend_consensus option, 10000 or 20000 is probably a good value for this, depends how much context you want to see up and downstream of the insertions.

Coracollar commented 3 years ago

Hi, Thanks for your response I will add the —extend_consensus option. As for the ‘chr’ part, my reference genome does not have them, so I imagine the .bam file hasn’t either (although is a binary file I cannot check that, or at least idk how to.

Thanks, Cora

On 6 Nov 2020, at 3:59 pm, adamewing notifications@github.com wrote:

Hm, it's not obvious to me offhand, could you check whether the chromosome names in the .bam file, and the genome reference .fasta file use a 'chr' prefix or not?

Also, if your intention is to try out the methylation functions downstream of tldr, you'll probably want to use the --extend_consensus option, 10000 or 20000 is probably a good value for this, depends how much context you want to see up and downstream of the insertions.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/adamewing/tldr/issues/4#issuecomment-722816777, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANZ7PMEHEJTLNQIGLF6W6FTSON7CPANCNFSM4TKWEVKA.

adamewing commented 3 years ago

If it's the same ref used to generate the .bam file you're probably right but samtools idxstats should indicate what the chromosome / contig names are.

Is your .bam indexed (i.e. is there a .bai file in the directory with the same name as the .bam)?

On Fri, 6 Nov 2020, 4:01 pm Coracollar, notifications@github.com wrote:

Hi, Thanks for your response I will add the —extend_consensus option. As for the ‘chr’ part, my reference genome does not have them, so I imagine the .bam file hasn’t either (although is a binary file I cannot check that, or at least idk how to.

Thanks, Cora

On 6 Nov 2020, at 3:59 pm, adamewing notifications@github.com wrote:

Hm, it's not obvious to me offhand, could you check whether the chromosome names in the .bam file, and the genome reference .fasta file use a 'chr' prefix or not?

Also, if your intention is to try out the methylation functions downstream of tldr, you'll probably want to use the --extend_consensus option, 10000 or 20000 is probably a good value for this, depends how much context you want to see up and downstream of the insertions.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/adamewing/tldr/issues/4#issuecomment-722816777>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/ANZ7PMEHEJTLNQIGLF6W6FTSON7CPANCNFSM4TKWEVKA .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/adamewing/tldr/issues/4#issuecomment-722864902, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAH5HES4RYDXKMJM42YV4ADSOOGKRANCNFSM4TKWEVKA .

Coracollar commented 3 years ago

Hi, Yes they do not have the ‘chr’ just like the reference. But I do have the .bai file.

(Nanopore) [coracollar@spartan-login1 C3]$ samtools idxstats C3.sorted.bam 10 130694993 478872 0 11 122082543 376536 0 12 120129022 385296 0 13 120421639 384289 0 14 124902244 456659 0 15 104043685 320986 0 16 98207768 311632 0 17 94987271 318806 0 18 90702639 285762 0 19 61431566 185610 0 1 195471971 627659 0 2 182113224 788621 0 3 160039680 507456 0 4 156508116 541386 0 5 151834684 512128 0 6 149736546 506149 0 7 145441459 524532 0 8 129401213 427870 0 9 124595110 4930659 0 MT 16299 201 0 X 171031299 413660 0 Y 91744698 861503 0

On 6 Nov 2020, at 6:14 pm, adamewing notifications@github.com wrote:

samtools idxstats

adamewing commented 3 years ago

Thanks. I've added a new mouse ref to the tldr repository that doesn't use 'chr' prefixes. Try pulling the latest changes (git pull) and then pointing the -n option at nonref.collection.mm10.bed.gz instead of the version with "chr" in the filename. Also, if the machine you are running this on has multiple cores, things will go faster if you set -p to the number of available cores (or some number > 1).

Coracollar commented 3 years ago

Hi, Thanks for your thoroughness. I have tried it, but it keeps with the not retrieve index

2020-11-11 10:56:37,563 te-ont started with command: /home/coracollar/anaconda3/envs/tldr/bin/tldr -b /data/gpfs/projects/punim1048/allbarcoded/Control/C3/C3.sorted.bam -e /d ata/gpfs/projects/punim1048/TEs/tldr/ref/teref.mouse.fa -r /data/gpfs/projects/punim1048/GRCm38_genome/genome.fa -n /data/gpfs/projects/punim1048/TEs/tldr/ref/nonref.collecti on.mm10.bed.gz -p 16 --color_consensus --detail_output --extend_consensus 2000 2020-11-11 10:56:37,563 output basename: C3.sorted 2020-11-11 10:57:48,097 writing clusters to C3.sorted/19.pickle 2020-11-11 10:58:28,754 writing clusters to C3.sorted/18.pickle 2020-11-11 10:58:46,017 writing clusters to C3.sorted/16.pickle 2020-11-11 10:58:46,391 writing clusters to C3.sorted/17.pickle 2020-11-11 10:58:47,516 writing clusters to C3.sorted/MT.pickle 2020-11-11 10:59:02,320 writing clusters to C3.sorted/15.pickle 2020-11-11 10:59:05,748 writing clusters to C3.sorted/12.pickle 2020-11-11 10:59:16,998 writing clusters to C3.sorted/11.pickle 2020-11-11 10:59:22,094 writing clusters to C3.sorted/13.pickle 2020-11-11 10:59:29,542 writing clusters to C3.sorted/14.pickle 2020-11-11 10:59:37,297 writing clusters to C3.sorted/10.pickle 2020-11-11 10:59:45,480 writing clusters to C3.sorted/5.pickle 2020-11-11 11:00:10,116 writing clusters to C3.sorted/6.pickle 2020-11-11 11:00:25,135 writing clusters to C3.sorted/3.pickle 2020-11-11 11:00:30,458 writing clusters to C3.sorted/X.pickle 2020-11-11 11:00:30,718 writing clusters to C3.sorted/4.pickle 2020-11-11 11:00:31,683 writing clusters to C3.sorted/Y.pickle 2020-11-11 11:00:37,739 writing clusters to C3.sorted/7.pickle 2020-11-11 11:00:45,385 writing clusters to C3.sorted/1.pickle 2020-11-11 11:01:02,051 writing clusters to C3.sorted/2.pickle 2020-11-11 11:01:02,951 writing clusters to C3.sorted/8.pickle 2020-11-11 11:07:22,118 writing clusters to C3.sorted/9.pickle 2020-11-11 11:07:27,012 loaded 35483 clusters from C3.sorted/10.pickle [E::idx_find_and_load] Could not retrieve index file for '21' [E::idx_find_and_load] Could not retrieve index file for '20' [E::idx_find_and_load] Could not retrieve index file for '23' [E::idx_find_and_load] Could not retrieve index file for '21' [E::idx_find_and_load] Could not retrieve index file for '21' [E::idx_find_and_load] Could not retrieve index file for '20' [E::idx_find_and_load] Could not retrieve index file for '23' [E::idx_find_and_load] Could not retrieve index file for '20' [E::idx_find_and_load] Could not retrieve index file for '23' [E::idx_find_and_load] Could not retrieve index file for '21' [E::idx_find_and_load] Could not retrieve index file for '23' [E::idx_find_and_load] Could not retrieve index file for '20' [E::idx_find_and_load] Could not retrieve index file for '15' [E::idx_find_and_load] Could not retrieve index file for '16' [E::idx_find_and_load] Could not retrieve index file for '15'

It goes on but at the end it still write something in the table:

2020-11-11 11:07:55,036 finished C3.sorted/10.pickle. wrote 29 records to C3.sorted.table.txt

So I am not super sure what is working and what not.

Thanks, Cora

On 9 Nov 2020, at 5:05 pm, adamewing notifications@github.com wrote:

Thanks. I've added a new mouse ref to the tldr repository that doesn't use 'chr' prefixes. Try pulling the latest changes (git pull) and then pointing the -n option at nonref.collection.mm10.bed.gz instead of the version with "chr" in the filename. Also, if the machine you are running this on as multiple cores, things will go faster if you set -p to the number of available cores (or some number > 1).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/adamewing/tldr/issues/4#issuecomment-723780413, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANZ7PME65NXYJNQTNBXLQT3SO6BELANCNFSM4TKWEVKA.

adamewing commented 3 years ago

This turned out to be due to a change in HTSlib, it's just a diagnostic message: https://github.com/pysam-developers/pysam/issues/939

Should be fixed in a38312f, re-open if you run into it again.