CALeDNA / crux

Pipeline For Creating eDNA Reference Libraries
https://ucedna.com/software
0 stars 0 forks source link

Make list of all GenBank accessions for type "environmental sample" or "ENV" #11

Open max-mapper opened 2 years ago

max-mapper commented 2 years ago

I think we need to download all gbenv* files from https://ftp.ncbi.nlm.nih.gov/genbank/gbenv1 e.g. gbenv1.seq.gz - gbenv72.seq.gz, then extract them and parse all the VERSION AB000684.1 values to get a list of accessions like

AB000684.1

Then we can use that list to filter out all the environmental samples from the NT fasta files the next time we run BWA

max-mapper commented 2 years ago

I am assuming AB000684.1 shows up in one of the NT chunks somewhere, we might want to validate that