PreHGT runs at the genus level to pull in (pseudo) pangenome information, which is used to estimate contamination vs. real transfer events.
Right now, we don't do a good job of reporting how many/which species are represented for each genera. Below I include some code I recently used to get the species (organism name) information from ncbi based on the genome accession (`GCA/GCF).
I ran this on all files matching download/*_genome.csv
Install tools
conda install -c conda-forge ncbi-datasets-cli jq
collect genome accessions without csv headers
for infile in *csv
do
cat $infile | tail -n +2 >> genomes.csv
done
PreHGT runs at the genus level to pull in (pseudo) pangenome information, which is used to estimate contamination vs. real transfer events.
Right now, we don't do a good job of reporting how many/which species are represented for each genera. Below I include some code I recently used to get the species (organism name) information from ncbi based on the genome accession (`GCA/GCF).
I ran this on all files matching
download/*_genome.csv
Install tools
collect genome accessions without csv headers
get species (organism name)