faircloth-lab / phyluce

software for UCE (and general) phylogenomics
http://phyluce.readthedocs.org/
Other
80 stars 49 forks source link

phyluce_assembly_match_contigs_to_probes - AttributeError: 'NoneType' object has no attribute 'groups' #233

Open BirdmanRidesAgain opened 3 years ago

BirdmanRidesAgain commented 3 years ago

I've been trying to pull UCEs out of a couple of full genome sequences - both a .fna file downloaded from Genbank and song sparrow assembly our lab had on hand. When I run the phyluce_assembly_match_contigs_to_probes command, following the pipeline procedures outlined here (https://phyluce.readthedocs.io/en/latest/daily-use/daily-use-3-uce-processing.html) I consistently get an attribute error.

Do you have any insights as to why that is? I didn't attach the input .fna file in question, as it is very large, but I can do so if that would be helpful.

INPUT CODE

phyluce_assembly_match_contigs_to_probes \ --contigs contigs/ \ # the junco file listed above is in this directory --probes uce-5k-probes.fasta \ --output junco_hyemalis_UCE/ \ --log-path log

OUTPUT

File "/Users/melospiza/miniconda3/envs/phyluce-1.7.1/bin/phyluce_assembly_match_contigs_to_probes", line 421, in main() File "/Users/melospiza/miniconda3/envs/phyluce-1.7.1/bin/phyluce_assembly_match_contigs_to_probes", line 354, in main contig_name = get_contig_name(lz.name1) File "/Users/melospiza/miniconda3/envs/phyluce-1.7.1/bin/phyluce_assembly_match_contigs_to_probes", line 279, in get_contig_name return match.groups()[0] AttributeError: 'NoneType' object has no attribute 'groups'

brantfaircloth commented 3 years ago

Did you follow Tutorial 3? My guess is that it is due to the headers not being quite right when you run match_contigs_to_probes… but if you got through the first part of Tutorial 3, the headers should be correct. I’m also on vacation at the moment, so have limited ability to check things.

djlduckett commented 3 years ago

Hi Brant. Let me know if you would like me to make this a separate issue, but I am having a similar problem. I assembled my UCE contigs for each sample using itero, and then put all the contig files in a folder named contigs. I also have the Tetrapods-UCE-5Kv1.fasta probe file.

When I run _phyluce_assembly_match_contigs_toprobes --contigs contigs/ --probes Tetrapods-UCE-5Kv1.fasta --output match/ I get the same error as above. The header lines from my contig fasta files look like _>uce-10_1_length_259_cov8.247788. I do get a lastz and sqlite file for my first sample only. The lastz file has lines like this:

10271 >uce-2_1_length_304_cov_16.118081 + 82 203 121 >uce-2_p1 |source:faircloth,probes-id:2247,probes-locus:2,probes-probe:1 - 0 120 120 ....:....x...................................................................................................x.-......... 111M1D9M 117/120 97.5% 120/121 99.2% 11298 >uce-3_1_length_358_cov_6.458462 + 106 226 120 >uce-3_p1 |source:faircloth,probes-id:9417,probes-locus:3,probes-probe:1 + 0 120 120 ........................................................................................................................ 120M 120/120 100.0% 120/120 100.0%

Do you have any ideas of how I can fix this error?

brantfaircloth commented 3 years ago

You should be able to create/edit ~/.phyluce.conf and add:

[headers]
trinity:comp\d+_c\d+_seq\d+|c\d+_g\d+_i\d+|TR\d+\|c\d+_g\d+_i\d+|TRINITY_DN\d+_c\d+_g\d+_i\d+
velvet:node_\d+
abyss:node_\d+
idba:contig-\d+_\d+
spades:NODE_\d+_length_\d+_cov_\d+.\d+
itero:uce-\d+_length_\d+_cov_\d+.\d+

This is probably the easiest way to fix. Alternatively, you could edit the config file in phyluce, which is nested within your conda environment (on my machine the path to that file is ~/miniconda3/envs/phyluce-1.7.1/phyluce/config/phyluce.conf).

djlduckett commented 3 years ago

That worked! I just had to make a little tweak to the expression

itero:uce-\d+_\d+_length_\d+_cov_\d+.\d+

Thanks for your help!