dib-lab / khmer

In-memory nucleotide sequence k-mer counting, filtering, graph traversal and more
http://khmer.readthedocs.io/
Other
749 stars 295 forks source link

Add links to publication collections #1063

Closed ctb closed 9 years ago

ctb commented 9 years ago

We can also add links to the Google Scholar lists, but those are a lot more haphazard and less useful (although hopefully inclusive).

blahah commented 9 years ago

Sadly, capturing tool citations is ridiculously hard. Publishers could make it easier but they choose not to. We've made some tools at ContentMine and could do some data mining to find probably 95% of the khmer citations. Here's a taste:

$ getpapers --query '"khmer" AND ("transcriptome" OR "metagenome" OR "genome")' --outdir khmer --xml --pdf
info: Found 59 open access results

So, 59 papers possibly citing khmer in just the open access subset of Europe Pubmed Central, give or take a couple of false positives. Full list of titles (looks to me like about 12 false positives):

$ jq '.[].title[0]' all_results.json
"Transcriptome analysis of northern elephant seal (Mirounga angustirostris) muscle tissue provides a novel molecular resource and physiological insights."
"A metagenomic approach to characterize temperate bacteriophage populations from Cystic Fibrosis and non-Cystic Fibrosis bronchiectasis patients."
"Draft Genome Sequence of a Papaverine-Degrading, Gram-positive Arthrobacter sp., Isolated from Soil Near Hohenheim, Germany."
"Draft Genome Sequence of Phenylobacterium immobile Strain E (DSM 1986), Isolated from Uncontaminated Soil in Ecuador."
"Transcriptome assembly, profiling and differential gene expression analysis of the halophyte Suaeda fruticosa provides insights into salt tolerance."
"Global transcriptomic profiling demonstrates induction of oxidative stress and of compensatory cellular stress responses in brown trout exposed to glyphosate and Roundup."
"The oak gene expression atlas: insights into Fagaceae genome evolution and the discovery of genes regulated during bud dormancy release."
"Ecological roles of dominant and rare prokaryotes in acid mine drainage revealed by metagenomics and metatranscriptomics."
"These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure."
"GATB: Genome Assembly & Analysis Tool Box."
"Determining the quality and complexity of  next-generation sequencing data without a  reference genome."
"De novo Assembly and Analysis of the Northern Leopard Frog Rana pipiens Transcriptome."
"Whole-genome sequences of three symbiotic endozoicomonas strains."
"Biogeography and individuality shape function in the human skin metagenome."
"An introduction to the analysis of shotgun metagenomic data."
"Palaeosymbiosis revealed by genomic fossils of Wolbachia in a strongyloidean nematode."
"Identification and characterization of alternative splicing in parasitic nematode transcriptomes."
"Spatial clustering and risk factors of malaria infections in Ratanakiri Province, Cambodia."
"Inducible defenses stay up late: temporal patterns of immune gene expression in Tenebrio molitor."
"Characterization of the kidney transcriptome of the South American olive mouse Abrothrix olivacea."
"Single cell genomics of uncultured, health-associated Tannerella BU063 (Oral Taxon 286) and comparison to the closely related pathogen Tannerella forsythia."
"Comparative genomics of flatworms (platyhelminthes) reveals shared genomic features of ecto- and endoparastic neodermata."
"Phylogeny and phylogeography of functional genes shared among seven terrestrial subsurface metagenomes reveal N-cycling and microbial evolutionary relationships."
"Comparative genomics of first available bovine Anaplasma phagocytophilum genome obtained with targeted sequence capture."
"Several genes encoding enzymes with the same activity are necessary for aerobic fungal degradation of cellulose in nature."
"Large-scale mitochondrial DNA analysis in Southeast Asia reveals evolutionary effects of cultural isolation in the multi-ethnic population of Myanmar."
"Genetic structure of Qiangic populations residing in the western Sichuan corridor."
"The intergenerational effects of war on the health of children."
"Taiwan Y-chromosomal DNA variation and its relationship with Island Southeast Asia."
"Blobology: exploring raw genome data for contaminants, symbionts and parasites using taxon-annotated GC-coverage plots."
"The genome and developmental transcriptome of the strongylid nematode Haemonchus contortus."
"A quantitative reference transcriptome for Nematostella vectensis early embryonic development: a pipeline for de novo assembly in emerging model systems."
"Influenza A(H5N1) virus surveillance at live poultry markets, Cambodia, 2011."
"Computational meta'omics for microbial community studies."
"Phylogenomics and analysis of shared genes suggest a single transition to mutualism in Wolbachia of nematodes."
"Disk-based k-mer counting on a PC."
"Autosomal STRs provide genetic evidence for the hypothesis that Tai people originate from southern China."
"Reconstructing mitochondrial genomes directly from genomic next-generation sequencing reads--a baiting and iterative mapping approach."
"Insights into archaeal evolution and symbiosis from the genomes of a nanoarchaeon and its inferred crenarchaeal host from Obsidian Pool, Yellowstone National Park."
"Multiple populations of artemisinin-resistant Plasmodium falciparum in Cambodia."
"Insight into the peopling of Mainland Southeast Asia from Thai population genetic structure."
"Patrilineal perspective on the Austronesian diffusion in Mainland Southeast Asia."
"Artemisinin-resistant malaria: research challenges, opportunities, and public health implications."
"Genetic structure of the Mon-Khmer speaking groups and their affinity to the neighbouring Tai populations in Northern Thailand."
"Human migration through bottlenecks from Southeast Asia into East Asia during Last Glacial Maximum revealed by Y chromosomes."
"Ancestry of the Iban is predominantly Southeast Asian: genetic evidence from autosomal, mitochondrial, and Y chromosomes."
"Southeast Asian diversity: first insights into the complex mtDNA structure of Laos."
"Phylogenetic analysis, based on EPIYA repeats in the cagA gene of Indian Helicobacter pylori, and the implications of sequence variation in tyrosine phosphorylation motifs on determining the clinical outcome."
"Haemoglobinopathies in southeast Asia."
"Oral and Posters, June 7."
$ jq '.[].citedByCount[0]' all_results.json | tr -d '"' | awk '{ sum += $1 } END { print sum }'
292

Those 59 papers were cited 292 times in total - so that's your second-level citation count.

$ jq '.[].journalInfo[0].dateOfPublication[0]' all_results.json | tr -d '"' | cut -f1 -d" " | sort -n | uniq -c
   6 2011
   2 2012
  13 2013
  21 2014
   8 2015

Number of papers citing khmer increasing steadily year-on-year.

Note that all this is on a small subset of the literature. Deeper analysis requires more time.

Later this year we will have a service where you can subscribe to mentions of your software in the fulltext of papers, but for now we have to do it manually.

ctb commented 9 years ago

On Wed, Jun 03, 2015 at 08:11:29AM -0700, Richard Smith-Unna wrote:

Sadly, capturing tool citations is ridiculously hard. Publishers could make it easier but they choose not to. We've made some tools at ContentMine and could do some data mining to find probably 95% of the khmer citations. Here's a taste:

$ getpapers --query '"khmer" AND ("transcriptome" OR "metagenome" OR "genome")' --outdir khmer --xml --pdf
info: Found 59 open access results

So, 59 papers possibly citing khmer in just the open access subset of Europe Pubmed Central, give or take a couple of false positives. Full list of titles (looks to me like about 12 false positives):

Does your "20% of biomedical literature" statement hold for these results, too? (Thanks!)

blahah commented 9 years ago

Yes, the 20% figure is approximately right for the Europe PMC dataset.

ctb commented 9 years ago

For me to do for 2.0 release; see also #853.

mr-c commented 9 years ago

Done in #1222