Open oschakoory opened 2 years ago
Hi, you did not provide a label file.
Hi, There is no option to provide a label file to tfrec_train_kmer.sh.
Is there another way to train the NN with this database?
Thank you for your help.
Hi,
The training_set_read_parser function in seq2tfrec_kmer.py parses each training/eval read in biopython-parsed format. Taxon ids are assumed to be available in read names. (E.g. for read with name >NC_018018.1|999|GCF_000265505.1-200000, 999 is parsed as its species taxon id.)
Please check whether the taxon ids have been included in read names.
The SILVA database is :
>MF461073.1.1202 Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Aeromonadaceae;Aeromonas;Aeromonas sp.
GUGCCAUGCGGCAGCUACACAUGCAGUCGAGCGGCAGCGGGAAAGUAGCUUGCUACUUUUGCCGGCGAGCGGCGGACGGG
>JQ063432.1.1464 Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Pectobacteriaceae;Sodalis;endosymbiont of Columbicola koopae
AUUGAACGCUGGCGGCAGGCCUAACACAUGCAAGUUGAGCGGCAGCGGGAAGAGGCUUGCUUCUUUGCCGGCGAGCGGCG
>EF216903.1.2125 Eukaryota;Amorphea;Amoebozoa;Discosea;Flabellinia;Dactylopodida;Neoparamoeba;Paramoeba perurans
ACCUGGUUGAUCCUGCCAGUAGUCAUAUGCUUGUCUUAAAGACUAAGCCAUGCACGUCUAAGUAUAAACACUUUGUACUU
In this case how should i generate the label file?
Thank you for such a quick respond.
Hi, I would like to train the network with SILVA 138.1 SSU database using
tfrec_train_kmer.sh -i SILVA_138.1_SSURef_NR99_tax_silva.fasta -v /vocabulary/tokens_merged_12mers.txt -o train.tfrec -s 20480000 -k 12
However, i am getting the following error:
Can you help me please?
Thank you.
Originally posted by @oschakoory in https://github.com/MicrobeLab/DeepMicrobes/issues/17#issuecomment-1143659382