faircloth-lab / phyluce

software for UCE (and general) phylogenomics
http://phyluce.readthedocs.org/
Other
76 stars 48 forks source link

phyluce_probe_slice_sequence_from_genomes: KeyError: 'chr1' #221

Closed nealplatt closed 3 years ago

nealplatt commented 3 years ago

I am using phyluce 1.7.0 installed via the linux conda yml. I have been running into an error when I try to extract UCE regions from genomes. I get a key error as if the genome hasn't been loaded. I have tried this with my own data....and even the data on the phyluce tutorial page.

cmd and output is below:

(phyluce-1.7.0) $ phyluce_probe_slice_sequence_from_genomes \
>     --lastz tutorial3-genome-lastz \
>     --conf genomes.conf \
>     --flank 500 \
>     --name-pattern "uce-5k-probes.fasta_v_{}.lastz.clean" \
>     --output tutorial-genome-fasta
[WARNING] Output directory exists, REMOVE [Y/n]: Y
2021-03-11 16:56:10,713 - phyluce_probe_slice_sequence_from_genomes - INFO - ======= Starting phyluce_probe_slice_sequence_from_genomes ======
2021-03-11 16:56:10,714 - phyluce_probe_slice_sequence_from_genomes - INFO - Version: 1.7.0
2021-03-11 16:56:10,714 - phyluce_probe_slice_sequence_from_genomes - INFO - Commit: None
2021-03-11 16:56:10,714 - phyluce_probe_slice_sequence_from_genomes - INFO - Argument --conf: /master/nplatt/uce-genome/genomes.conf
2021-03-11 16:56:10,714 - phyluce_probe_slice_sequence_from_genomes - INFO - Argument --contig_orient: False
2021-03-11 16:56:10,714 - phyluce_probe_slice_sequence_from_genomes - INFO - Argument --exclude: None
2021-03-11 16:56:10,715 - phyluce_probe_slice_sequence_from_genomes - INFO - Argument --flank: 500
2021-03-11 16:56:10,715 - phyluce_probe_slice_sequence_from_genomes - INFO - Argument --lastz: /master/nplatt/uce-genome/tutorial3-genome-lastz
2021-03-11 16:56:10,715 - phyluce_probe_slice_sequence_from_genomes - INFO - Argument --log_path: None
2021-03-11 16:56:10,715 - phyluce_probe_slice_sequence_from_genomes - INFO - Argument --output: /master/nplatt/uce-genome/tutorial-genome-fasta
2021-03-11 16:56:10,715 - phyluce_probe_slice_sequence_from_genomes - INFO - Argument --pattern: uce-5k-probes.fasta_v_{}.lastz.clean
2021-03-11 16:56:10,715 - phyluce_probe_slice_sequence_from_genomes - INFO - Argument --probe_prefix: uce-
2021-03-11 16:56:10,715 - phyluce_probe_slice_sequence_from_genomes - INFO - Argument --probe_regex: ^({}\d+)(?:_p\d+.*)
2021-03-11 16:56:10,715 - phyluce_probe_slice_sequence_from_genomes - INFO - Argument --probes: None
2021-03-11 16:56:10,715 - phyluce_probe_slice_sequence_from_genomes - INFO - Argument --verbosity: INFO
2021-03-11 16:56:10,716 - phyluce_probe_slice_sequence_from_genomes - INFO - =================== Starting Phyluce: Slice Sequence ===================
2021-03-11 16:56:10,718 - phyluce_probe_slice_sequence_from_genomes - INFO - ------------------- Working on galGal4 genome -------------------
2021-03-11 16:56:10,719 - phyluce_probe_slice_sequence_from_genomes - INFO - Reading galGal4 genome
Traceback (most recent call last):
  File "/master/nplatt/miniconda3/envs/phyluce-1.7.0/bin/phyluce_probe_slice_sequence_from_genomes", line 400, in <module>
    main()
  File "/master/nplatt/miniconda3/envs/phyluce-1.7.0/bin/phyluce_probe_slice_sequence_from_genomes", line 364, in main
    args.probes,
  File "/master/nplatt/miniconda3/envs/phyluce-1.7.0/bin/phyluce_probe_slice_sequence_from_genomes", line 153, in slice_and_return_fasta
    if max + flank < len(tb[name]):
  File "/master/nplatt/miniconda3/envs/phyluce-1.7.0/lib/python3.6/site-packages/bx/seq/twobit.py", line 83, in __getitem__
    seq = self.index[name]
KeyError: 'chr1'
(phyluce-1.7.0) $ 

My genomes.conf incase it is useful:

(phyluce-1.7.0) $ cat genomes.conf
[scaffolds]
galGal4:/master/nplatt/uce-genome/galGal4/galGal4.2bit
allMis2:/master/nplatt/uce-genome/allMis2/allMis2.2bit

...and dir structure:

(phyluce-1.7.0) $ ls *
genomes.conf  phyluce_probe_run_multiple_lastzs_sqlite.log  phyluce_probe_slice_sequence_from_genomes.log  tutorial3.sqlite  uce-5k-probes.fasta

allMis2:
allMis2.2bit  GCA_001541155.1_Algmis_Hirise_1.0_genomic.fna  sizes.tab

galGal4:
galGal4.2bit  sizes.tab

tutorial3-genome-lastz:
uce-5k-probes.fasta_v_allMis2.lastz.clean  uce-5k-probes.fasta_v_galGal4.lastz.clean

tutorial-genome-fasta:
galgal4.fasta

I checked and chr1 is in the 2bit and the lastz run:

(phyluce-1.7.0) $ head -n 1 galGal4/sizes.tab
chr1    195276750
(phyluce-1.7.0) $ grep chr1 tutorial3-genome-lastz/uce-5k-probes.fasta_v_galGal4.lastz.clean | head
11298   chr1    +       152936631 152936751     120     >uce-3_p1 |source:faircloth,probes-id:9417,probes-locus:3,probes-probe:1        +       0       120     120     ........................................................................................................................     120M    120/120 100.0   120/120 100.0   120/120 100.0

This problem is replicated in my actual dataset as well. I have been trying to work through the code to figure out why the genome sequences aren't being loaded into the dictionary but haven't had any luck.

Thanks for your help in advance.

Neal

brantfaircloth commented 3 years ago

hi neal,

i’ll check on this tomorrow. pretty sure I know what the issue is.

-b

brantfaircloth commented 3 years ago

Just need to get new package up... will take a few minutes.

brantfaircloth commented 3 years ago
conda activate phyluce-1.7.0
conda update phyluce

should update you to phyluce-1.7.1, in which the only substantial change is CLI formatting issue and the fix for your error.

nealplatt commented 3 years ago

Wow. Thanks for taking care of this. Everything works now.