faircloth-lab / phyluce

software for UCE (and general) phylogenomics
http://phyluce.readthedocs.org/
Other
76 stars 48 forks source link

Using harvested genome sequences in downstream analyses #299

Closed alexfranzen closed 1 year ago

alexfranzen commented 1 year ago

Hi. I am working on harvesting UCE loci from genome skimming data and was able to successfully do the tutorial III procedures with my data without issues. I then followed the instructions for the "Extracting UCE loci" procedure and I am running into an issue that I believe is with the sqlite file that was generated from tutorial III. When I do the 'phyluce_assembly_get_match_counts' step and use the .sqlite file this is the error I get:

(phyluce-1.7.2) [franzena@hydra-login01 uce-genome-test]$ phyluce_assembly_get_match_counts --locus-db UCE-genome-test.sqlite --taxon-list-config taxon-set.conf --taxon-group 'all' --incomplete-matrix --output taxon-sets/all/all-taxa-incomplete.conf
2023-03-22 19:42:07,403 - phyluce_assembly_get_match_counts - INFO - =========== Starting phyluce_assembly_get_match_counts ========== 2023-03-22 19:42:07,404 - phyluce_assembly_get_match_counts - INFO - Version: 1.7.2 2023-03-22 19:42:07,404 - phyluce_assembly_get_match_counts - INFO - Commit: None 2023-03-22 19:42:07,404 - phyluce_assembly_get_match_counts - INFO - Argument --extend_locus_db: None 2023-03-22 19:42:07,404 - phyluce_assembly_get_match_counts - INFO - Argument --incomplete_matrix: True 2023-03-22 19:42:07,404 - phyluce_assembly_get_match_counts - INFO - Argument --keep_counts: False 2023-03-22 19:42:07,404 - phyluce_assembly_get_match_counts - INFO - Argument --locus_db: /scratch/genomics/franzena/uce-genome-test/UCE-genome-test.sqlite
2023-03-22 19:42:07,404 - phyluce_assembly_get_match_counts - INFO - Argument --log_path: None 2023-03-22 19:42:07,405 - phyluce_assembly_get_match_counts - INFO - Argument --optimize: False 2023-03-22 19:42:07,405 - phyluce_assembly_get_match_counts - INFO - Argument --output: /scratch/genomics/franzena/uce-genome-test/taxon-sets/all/all-taxa-incomplete.conf
2023-03-22 19:42:07,405 - phyluce_assembly_get_match_counts - INFO - Argument --random: False 2023-03-22 19:42:07,405 - phyluce_assembly_get_match_counts - INFO - Argument --sample_size: 10 2023-03-22 19:42:07,405 - phyluce_assembly_get_match_counts - INFO - Argument --samples: 10 2023-03-22 19:42:07,405 - phyluce_assembly_get_match_counts - INFO - Argument --silent: False 2023-03-22 19:42:07,405 - phyluce_assembly_get_match_counts - INFO - Argument --taxon_group: all 2023-03-22 19:42:07,405 - phyluce_assembly_get_match_counts - INFO - Argument --taxon_list_config: /scratch/genomics/franzena/uce-genome-test/taxon-set.conf
2023-03-22 19:42:07,405 - phyluce_assembly_get_match_counts - INFO - Argument --verbosity: INFO 2023-03-22 19:42:07,406 - phyluce_assembly_get_match_counts - INFO - There are 1 taxa in the taxon-group '[all]' in the config file taxon-set.conf
2023-03-22 19:42:07,407 - phyluce_assembly_get_match_counts - INFO - Getting UCE names from database Traceback (most recent call last):
File "/home/franzena/.conda/envs/phyluce-1.7.2/bin/phyluce_assembly_get_match_counts", line 409, in
main()
File "/home/franzena/.conda/envs/phyluce-1.7.2/bin/phyluce_assembly_get_match_counts", line 382, in main
uces = get_uce_names(log, c)
File "/home/franzena/.conda/envs/phyluce-1.7.2/bin/phyluce_assembly_get_match_counts", line 126, in get_uce_names
c.execute("SELECT uce FROM matches")
sqlite3.OperationalError: no such table: matches

I'm not sure if this is an error on my part or if there is an issue with the phyluce_assembly_get_match_counts not recognizing the table. I've been able to successfully to this step before when doing the phyluce_assembly_match_contigs_to_probes step with the same data. I used the scaffolds.fasta file generated from the spades assemblies in the harvesting step (converted it to a 2bit file) so may that could also be part of the issue, though like I said the workflow seemed to work as intended. Any help would be appreciated and thank you very very much for providing and maintaining this resource!

brantfaircloth commented 1 year ago

I think the issue may be that you need to start one step before - in the “Finding UCE loci” section. The input for that is the output from the extraction process. This sort of seems like running the same thing twice - but it’s not, quite.

alexfranzen commented 1 year ago

That seemed to do the trick! Thanks for the very quick response!

brantfaircloth commented 1 year ago

Cool 😎