faircloth-lab / phyluce

software for UCE (and general) phylogenomics
http://phyluce.readthedocs.org/
Other
76 stars 48 forks source link

phylum_assembly_match_contigs_to_probes error #202

Closed GabiCamacho closed 3 years ago

GabiCamacho commented 3 years ago

Dear,

I'm having problems while trying to run phyluce_assembly_match_contigs_to_probes on my data, giving the the following error:

19:29:57,008 - phyluce_assembly_match_contigs_to_probes - INFO - ======= Starting phyluce_assembly_match_contigs_to_probes ======= 2020-09-22 19:29:57,008 - phyluce_assembly_match_contigs_to_probes - INFO - Version: git fatal: Not a git repository: '/home/bonnie/anaconda3/envs/py27/lib/python2.7/site-packages/.git' 2020-09-22 19:29:57,008 - phyluce_assembly_match_contigs_to_probes - INFO - Argument --contigs: /media/bonnie/Data_Drive/others/Heteroponerinae-contigs 2020-09-22 19:29:57,008 - phyluce_assembly_match_contigs_to_probes - INFO - Argument --dupefile: None 2020-09-22 19:29:57,008 - phyluce_assembly_match_contigs_to_probes - INFO - Argument --keep_duplicates: None 2020-09-22 19:29:57,008 - phyluce_assembly_match_contigs_to_probes - INFO - Argument --log_path: /media/bonnie/Data_Drive/others/log-heteroponerinae 2020-09-22 19:29:57,009 - phyluce_assembly_match_contigs_to_probes - INFO - Argument --min_coverage: 50 2020-09-22 19:29:57,009 - phyluce_assembly_match_contigs_to_probes - INFO - Argument --min_identity: 80 2020-09-22 19:29:57,009 - phyluce_assembly_match_contigs_to_probes - INFO - Argument --output: /media/bonnie/Data_Drive/others/uce-search-cov50-heteroponerinae 2020-09-22 19:29:57,009 - phyluce_assembly_match_contigs_to_probes - INFO - Argument --probes: /media/bonnie/Data_Drive/others/hym-probes-v2-ant-specific-uce-only.fasta 2020-09-22 19:29:57,009 - phyluce_assembly_match_contigs_to_probes - INFO - Argument --regex: ^(uce-\d+)(?:_p\d+.*) 2020-09-22 19:29:57,009 - phyluce_assembly_match_contigs_to_probes - INFO - Argument --verbosity: INFO 2020-09-22 19:29:57,151 - phyluce_assembly_match_contigs_to_probes - INFO - Creating the UCE-match database 2020-09-22 19:29:57,325 - phyluce_assembly_match_contigs_to_probes - INFO - Processing contig data 2020-09-22 19:29:57,326 - phyluce_assembly_match_contigs_to_probes - INFO - ----------------------------------------------------------------- 2020-09-22 19:30:26,227 - phyluce_assembly_match_contigs_to_probes - INFO - ECTA07: 2332 (3.55%) uniques of 65683 contigs, 0 dupe probe matches, 70 UCE loci removed for matching multiple contigs, 22 contigs removed for matching multiple UCE loci 2020-09-22 19:30:44,493 - phyluce_assembly_match_contigs_to_probes - INFO - GAB75: 2261 (8.39%) uniques of 26951 contigs, 0 dupe probe matches, 43 UCE loci removed for matching multiple contigs, 12 contigs removed for matching multiple UCE loci 2020-09-22 19:31:06,975 - phyluce_assembly_match_contigs_to_probes - INFO - GPC01: 2325 (5.59%) uniques of 41576 contigs, 0 dupe probe matches, 73 UCE loci removed for matching multiple contigs, 15 contigs removed for matching multiple UCE loci 2020-09-22 19:31:22,458 - phyluce_assembly_match_contigs_to_probes - INFO - GPC02: 2096 (9.62%) uniques of 21791 contigs, 0 dupe probe matches, 70 UCE loci removed for matching multiple contigs, 2 contigs removed for matching multiple UCE loci 2020-09-22 19:44:30,714 - phyluce_assembly_match_contigs_to_probes - INFO - RHY06: 2300 (6.48%) uniques of 35473 contigs, 0 dupe probe matches, 86 UCE loci removed for matching multiple contigs, 18 contigs removed for matching multiple UCE loci Traceback (most recent call last): File "/home/bonnie/anaconda3/envs/py27/bin/phyluce_assembly_match_contigs_to_probes", line 342, in main() File "/home/bonnie/anaconda3/envs/py27/bin/phyluce_assembly_match_contigs_to_probes", line 289, in main for lz in lastz.Reader(output): File "/home/bonnie/anaconda3/envs/py27/lib/python2.7/site-packages/phyluce/lastz.py", line 119, in iter yield self.next() File "/home/bonnie/anaconda3/envs/py27/lib/python2.7/site-packages/phyluce/lastz.py", line 140, in next lastz_result_split[k] = float(v.strip('%')) ValueError: could not convert string to float: >1723_0.00852258%_cov_13 len=856

I believe the problem is due to the fact that I have two sets of contigs with two different headings:

Set 1 runs fine, with the following headings..

comp0_c0_seq1 len=251 path=[229:0-250] GGAGGATTCTTTCCTTTATTATGCTACATGTACATGAATATATATGTGCATTATGTTAGC CGGTCGGTATATTTGTCGTCGTGTCCATGAAATCCCATAAAACCGCGTACGAGAGATATG GAGAAGGAGCGACAGAGAAGGAGATGAACATTCCTGCGGAATCAAAGTACCGCGCGGAGA GATGTAGAGAAAAATCGGGTTTCTTTGGTGAGCAAGATGTCCCGGGAGGTTTAAAGGACG AAGGAGTCCCG

Set 2 won't run, neither together with other contigs or separately, and it has the headings as following:

3_0.277257%_cov_268 len=1326 TATATATATATATATATATATATATATTAGGTGTACAAAAAAGTTCTGAGCTTTTTTTTGTAAAATTCAATCTTTATTCAAAAACAAACAAAAAAATATTAATCAGCGAAATATTCCCCGTTTGCATCAACGACCTTTTGCCATCTTTCGGGTAGCTTGCGAATTCCTTGACGATAGAAGCTCACCGGCTTCGAAGCGATAAAGTCATCAATGCATTTTCGTATCTCTTCAAATCTCACGAAATGTGTATCAGCCAAATGATGCTGCAGCGACCGGA

Could you help me solve this problem, please?

Thank you in advance.

brantfaircloth commented 3 years ago

This is definitely due to the contig naming scheme that looks like 3_0.277257%_cov_268 len=1326 - this format is not expected by phyluce (and it is also not produced by phyluce), and if you would like to use these data (without changing the contig headers), you will need to adjust the regular expressions that phyluce is using to identify contigs (these may be adjusted in the phyluce configruation file - which you can create at ~/.phyluce/config.

brantfaircloth commented 3 years ago

One other quick note - it may be easiest/easier to just re-assemble data for those contigs with the weird headers (that's a really odd format).

GabiCamacho commented 3 years ago

Thank you very much Brant!

Gabi Camacho Postdoctoral Fellow Pronouns: she/her/hers California Academy of Sciences T 415.379.5309 gcamacho@calacademy.org

55 Music Concourse Drive Golden Gate Park San Francisco, CA 94118 www.calacademy.org

The mission of the California Academy of Sciences

is to explore, explain, and sustain life.

Learn more https://www.calacademy.org/ about our work. Facebook http://www.facebook.com/calacademy | Twitter https://twitter.com/calacademy | Instagram https://www.instagram.com/calacademy/

On Wed, Sep 23, 2020 at 6:53 AM Brant Faircloth notifications@github.com wrote:

One other quick note - it may be easiest/easier to just re-assemble data for those contigs with the weird headers (that's a really odd format).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/faircloth-lab/phyluce/issues/202#issuecomment-697389685, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGBXNWYLKHOZCLKIHZIRAE3SHH4WJANCNFSM4RWNN7EA .