Closed tkosciol closed 7 years ago
data location:
the faa file for each genome is located in barnacle:/projects/genome_annotation/201605/annot
, each in its own dir (eg. G001281285/tmp/prodigal.faa
). The genome ids (eg G001281285) and their info is in this table /home/evko1434/repophlan/repophlan_microbes_wscores.txt
@RNAer ok, maybe let's start by getting the number for micronota_raw
and putting all predicted genes in one place.
done!
data in /projects/microprot/data/micronota/clustering
on Barnacle
run CD-HIT on Micronota genes (Prodigal outputs) and calculate stats for different clustering thresholds.
write output to
micronota_stats.md
output structure:date: DD-MM-YYYY micronota genomes: X0
number indicates clustering threshold, e.g. micronota_90 means clustering at 90% sequence identity threshold.