Open realmarcin opened 4 months ago
@bsantan let's think about how to add this to the transform code. I believe filtering on no 'virus' or 'phage' in the reference proteome names will work. We can say we assume that no multicellular organism is 'unclassified' -- though this may not be entirely true. The first pass/test transform could just exclude anything from 'unclassified'.
These viruses are found in the ncbitaxon_removed_subset.json:
"val" : "Cotton leaf curl Rajasthan virus betasatellite defective interfering DNA" "lbl" : "Cotton leaf curl Rajasthan virus defective interfering DNA", "lbl" : "Cotton leaf curl virus betasatellite defective interfering DNA", "lbl" : "Hygrophorus parvirussula", "lbl" : "unidentified Cotton leaf curl Rajasthan virus-associated DNA",