faircloth-lab / phyluce

software for UCE (and general) phylogenomics
http://phyluce.readthedocs.org/
Other
76 stars 48 forks source link

phyluce cannot get fasta files for samples #258

Closed hutumpyancha closed 2 years ago

hutumpyancha commented 2 years ago

Hello, I am again stuck at another step of the analysis.. For the "extracting UCE loci" I created the taxon-set.conf file and ran phyluce_assembly_get_match_counts to create the all-taxa-incomplete.conf file ( attached as all-taxa-incomplete.pdf file). On looking into this file using a texteditor, I noticed that the naming format has changed for my samples. Ex: ANN1071-Anthoptilum_sp got changed to ANN1071_Anthoptilum_sp I went ahead and changed all the taxa names to match the previous naming format that I have been using (attached all-taxa-incomplete1.pdf all-taxa-incomplete.pdf all-taxa-incomplete1.pdf ) and used that for the phyluce_assembly_get_fasta_from_match_counts step. But this is where it stops working and tells me that it cannot find the fasta files asscociated with my samples. I have checked the paths specified as well as the spellings of the names etc and everytime I am getting the same error ( attached as phyluce_grt_fasta error). I have tried with the unchanged all-taxa-incomplete.conf file as well and it still gives me the same error. Is there any reason why the names are getting changed in this file and is it that what is hindering this step even when I am correcting the names in a text editor? Thank you so much again for all your assistance and apologies for troubling you so many times Upasana

taxon-set.pdf all-taxa-incomplete1.pdf all-taxa-incomplete.pdf phyluce_get_fasta error.pdf screenshot_contigs_folder.pdf

brantfaircloth commented 2 years ago

The naming convention is changed on purpose. You don't want to change it back. As for the contigs, try replacing the dashes in the names with underscores.

hutumpyancha commented 2 years ago

Thank you so much for your response ; I have removed all the taxa names from the contigs and kept only the sample names (Eg: ANN1071-Anthoptilum_sp. is now ANN1071 only) and have re run the consequtive steps. However, I am again getting errors at the phyluce_get_fastas_from_match_counts again. This time it is saying -

2021-11-11 20:53:31,718 - phyluce_assembly_get_fastas_from_match_counts - INFO - Argument --log_path: /ddnA/qb2work/upasanag/taxon_sets/all/log 2021-11-11 20:53:31,718 - phyluce_assembly_get_fastas_from_match_counts - INFO - Argument --match_count_output: /ddnA/qb2work/upasanag/taxon_sets/all/all-taxa-incomplete.conf 2021-11-11 20:53:31,718 - phyluce_assembly_get_fastas_from_match_counts - INFO - Argument --output: /ddnA/qb2work/upasanag/taxon_sets/all/all-taxa-incomplete.fasta 2021-11-11 20:53:31,719 - phyluce_assembly_get_fastas_from_match_counts - INFO - Argument --verbosity: INFO Traceback (most recent call last): File "/home/upasanag/.conda/envs/phyluce-1.7.1/bin/phyluce_assembly_get_fastas_from_match_counts", line 306, in main() File "/home/upasanag/.conda/envs/phyluce-1.7.1/bin/phyluce_assembly_get_fastas_from_match_counts", line 209, in main len(organisms), os.path.basename(args.match_count_output) TypeError: object of type 'NoneType' has no len()

brantfaircloth commented 2 years ago

It looks like there is a problem in your --match_count_output config file. Specifically, the error is suggesting that the [Organisms] section of the config file has some sort of problem. Without more information, I can't suggest much of a solution except (1) as before, you should not need to change the contig file names other than replace the dashes within them with underscores and (2) check the all-taxa-incomplete.conf, because something seems wrong with the [Organisms] section of this file.

I would suggest you start the process again, with just a few contig assemblies (3-4) rather than many. Make sure those contig assemblies are named like ANN1071_Anthoptilum_sp or, better yet, Anthoptilum_sp_ANN1071, which is how we name everything (Genus_species_accession). Then, run through the entire process with those 3-4 to get everything working. Then repeat the process, increasing the sample to all of your contig files.

hutumpyancha commented 2 years ago

Changing the sample names to the format you suggested solved the problem; I have been able to go through the remaining steps without any glitches as well. Thank you so much for all your guidance and assistance, Regards, Upasana

brantfaircloth commented 2 years ago

Great! đź‘Ť