Closed idanaughton closed 2 years ago
The header should be truncated by biopython at the first space (which comes before the pipe "|
" character). Given that, it looks like you will need to update the phased regular expression to something like:
phased:uce-\d+_INM\d+_d+
which adds the "M
" in the sample name. If the "M
" can be any letter (e.g. it's different by sample), then something like:
phased:uce-\d+_\w+_d+
adding the "\w
" in place of "INM\d+
" should catch all letter/number characters in that middle position.
This worked, thank you! Just had to figure out how to truncate with biopython and tweak the expression to: uce-\d+IN\w+\d
Thanks again!
You bet 👍
I'm attempting to match my phased UCE sequences to my UCE probes using phyluce_assembly_match_contigs_to_probes, and start the phyluce process over with my phased data in order to construct alignments of taxon-specific groups from my phased samples. I get the following error when trying to use phyluce_assembly_match_contigs_to_probes with my phased fasta files and probe set:
Traceback (most recent call last): File "/data/home/idanaughton/.conda/envs/phyluce-1.7.1/bin/phyluce_assembly_match_contigs_to_probes", line 421, in
main()
File "/data/home/idanaughton/.conda/envs/phyluce-1.7.1/bin/phyluce_assembly_match_contigs_to_probes", line 354, in main
contig_name = get_contig_name(lz.name1)
File "/data/home/idanaughton/.conda/envs/phyluce-1.7.1/bin/phyluce_assembly_match_contigs_to_probes", line 279, in get_contig_name
return match.groups()[0]
AttributeError: 'NoneType' object has no attribute 'groups'
I'm guessing this has to do with how my phased reads are named, which follows this convention: uce-11841_INM640_0 |uce-11841_phased where INM640 is the sample name. I tried adding a config file at ~/.phyluce.conf with the following contents (after reading through other issues below):
[headers] trinity:comp\d+_c\d+_seq\d+|c\d+_g\d+_i\d+|TR\d+|c\d+_g\d+_i\d+|TRINITY_DN\d+_c\d+_g\d+i\d+ velvet:node\d+ abyss:node\d+ idba:contig-\d+\d+ spades:NODE_\d+length\d+cov\d+.\d+ itero:uce-\d+length\d+cov\d+.\d+ phased:uce-\d+_IN\d+_d+ |uce-\d+_phased
but still get the same error.
Any pointers here would be much appreciated. Thanks much!