cov-lineages / lineages-website

16 stars 13 forks source link

Request for lineage_data.full.json update #27

Closed al-obrien closed 1 year ago

al-obrien commented 1 year ago

It appears the lineage_data.full.json has not been updated for about a month. Based upon the text on the Pango website, this JSON file should have the "full set of lineages"; however, it appears to have fallen behind compared to other sources. For example, the DT lineage is not listed.

Will this be updated, and if not, what would be the preferred source for timely lineage lists? Perhaps this location instead?

rmcolq commented 1 year ago

It all you want is the list of lineages, then the pango-designation github is the definitive list. If you need more of the summary statistics, you still want the information from these JSON. I will take a look at why it has stalled.

al-obrien commented 1 year ago

Thank you for this. Having the extra information is helpful. I will now also refer to these other sources for the latest lineage lists.

In case it is of interest to others that stumble across this post, I added a helper function below to grab the relevant table in the R language and pull out the 'alias' references:

# Fetch covid lineage details from more freq updated source
fetch_covlin_table <- function(url = 'https://raw.githubusercontent.com/cov-lineages/pango-designation/master/lineage_notes.txt',
                             pattern = '^([Aa]lias of)\\s?([A-Z\\.\\d]*)[,]?\\s.*$',
                             description_col = 'Description', 
                             as_tibble = FALSE) {

  # Load tables
  cov_desc <- data.table::fread(url, sep = '\t', data.table = TRUE)

  # Index those matching the alias pattern
  pattern_index <- grep(cov_desc[[description_col]], pattern = pattern, perl = TRUE)

  # Pre allocate
  cov_desc$full_name <- NA_character_

  # Sub in those that matched the specific pattern
  cov_desc$full_name[pattern_index] <- gsub(cov_desc[[description_col]], 
                                            pattern = pattern,
                                            replacement = '\\2', 
                                            perl = TRUE)[pattern_index]

  data.table::setcolorder(cov_desc, c('Lineage', 'full_name', description_col))

  # Return final table
  if(as_tibble) return(tibble::as_tibble(cov_desc)) else return(cov_desc)
}
AngieHinrichs commented 1 year ago

For those working in python, @corneliusroemer's pango_aliasor package is very useful for expanding or compressing aliases.

al-obrien commented 1 year ago

Great to know there are some tools out there for this purpose. I personally have not come across an equivalent package for R users. That said, if there is enough interest I can happily throw one together; time will tell...

rmcolq commented 1 year ago

The file is once more updating! Glad you found some more resources too

al-obrien commented 1 year ago

Closing loop on prior comment... I threw together an R package called {pangoRo}. Hopefully this helps R users similar to how pango_aliasor has helped users in Python! Thank you to @corneliusroemer's project for inspiring me to wrap up some ad hoc code into a dedicated package.