Closed johausmann closed 6 months ago
For the new data release, we skip the host_tax_id check and check the host_scientific_name. Here we will accept the values Homo Sapiens and NA, since we assume that a large part of the NA samples come from human donors.
Fixed by ENA upstream
During the data update I noticed a serious problem in the ENA accessor. All samples were filtered out because of the missing host_tax_id, even though this is a field that is requested in the URL. It seems that this field is always NA no matter what samples we query. I also did a quick check with some Sars-Cov2 ENA runs that were already processed in a previous database update. Here we can observe the same problem.
https://www.ebi.ac.uk/ena/portal/api/search?result=read_run&format=tsv&query=%22run_accession=DRR287659%22&fields=host_tax_id,host_scientific_name
However, the host_scientific_name field is returned. We could update the accessor module to filter by either host_tx_id or host_scientific_name.