Open peterjc opened 1 year ago
Hi Peter,
Well spotted, i think you might be right with this. Im travelling internationally for the next 3 weeks, but when im back in the office i will look into this a bit deeper and discuss with the other authors. This shouldnt impact the papers overall conclusions but a minor correction might be required to fix this potential error.
For reference, my R script intended to export the data underlying Figure 3 for re-plotting outside of R:
# Export Figure 3 data
#
# Assume's you are in RStudio and have run the Figure 3 section in
# https://github.com/alexpiper/HemipteraMetabarcodingMS/blob/master/hemiptera_metabarcoding.Rmd
#
# We're just aggregating the data already filtered and compiled, and exporting as TSV.
# To simplify linking the tables, aggregating at genus level to pool the two Acizzia spp. etc.
require(comprehenr)
export_ALL <- df_exp[c("pool_comp", "Genus", "Abundance")]
export_ALL <- export_ALL[order(export_ALL$Genus, export_ALL$pool_comp),]
names(export_ALL)[3] <- "Expected"
export_COI <- df_coi[c("pool_comp", "Genus", "Abundance")]
export_COI <- export_COI[order(export_COI$Genus, export_COI$pool_comp),]
export_COI <- aggregate(. ~ pool_comp + Genus, data = export_COI, sum)
assertthat::are_equal(export_COI$pool_comp, export_ALL$pool_comp)
assertthat::are_equal(export_COI$Genus, export_ALL$Genus)
export_ALL["COI"] <- export_COI$Abundance
export_18S <- df_18s[c("pool_comp", "Genus", "Abundance")]
export_18S <- export_18S[order(export_18S$Genus, export_18S$pool_comp),]
export_18S <- aggregate(. ~ pool_comp + Genus, data = export_18S, sum)
assertthat::are_equal(export_18S$pool_comp, export_ALL$pool_comp)
assertthat::are_equal(export_18S$Genus, export_ALL$Genus)
export_ALL["18S"] <- export_18S$Abundance
export_12S <- df_12s[c("pool_comp", "Genus", "Abundance")]
export_12S <- export_12S[order(export_12S$Genus, export_12S$pool_comp),]
export_12S <- aggregate(. ~ pool_comp + Genus, data = export_12S, sum)
assertthat::are_equal(export_12S$pool_comp, export_ALL$pool_comp)
assertthat::are_equal(export_12S$Genus, export_ALL$Genus)
export_ALL["12S"] <- export_12S$Abundance
export_ALL <- export_ALL[order(export_ALL$pool_comp, export_ALL$Genus),]
# Relabel to match desired captions by converting pool_comp which is e.g. 4.5
# for individuals class 4 (i.e. from 100, 250, 500 and 1000) and pool 5:
names(export_ALL)[1] <- "Caption"
export_ALL[1] <- to_vec(for(px in export_ALL[1]) paste(c("100", "250", "500", "1000")[round(10*(px-floor(px)))], "Pool", floor(px)))
write.table(export_ALL, file="figure3.tsv", sep="\t", quote=FALSE, row.names=FALSE)
The background to this is I'm hoping to include your dataset as a worked example for my tool, https://github.com/peterjc/thapbi-pict/
@alexpiper have you had a chance to double check Figure 3's false-negative placement yet? Thanks
Very sorry for taking this long to get back to you.
Ive confirmed that the Diuraphis noxia false negative for 12S annotated on 250 Pool 2 should instead be annotated on 500 Pool 2.
While checking this i also noticed a couple of other issues:
We are planning a correction to the article but the corresponding author is on long service leave so it may take some time. Here is what the figure should look like following the changes:
Thank you Alex.
My alternative analysis concurs with Bactericera cockerelli being absent from 500 Pool 4 in 18S (we differ on a few points but given the flexibility in threshold settings etc there was nothing else which worried me).
I'll try to finish up my work on https://github.com/peterjc/thapbi-pict/pull/515 using this as a worked example, and intend to cite Batovska et al. 2021 in the planned paper.
Great to hear the results are generally in agreement. I agree that with the various thresholds and parameters available at each step of the process its difficult to exactly reproduce the results with a different pipeline, particularly when some taxa are so close to the limit of detection/index switching rate.
Im keen to try your tool in future, and will follow with interest
From attempting to recreate the figure from the underlying data (and my own analysis of the FASTQ files), I think there is a misplaced False Negative annotation on Figure 3. I do not believe this change materially affects the paper, but is a point of confusion in attempting to replicate/reproduce the results.
Referring to Figure 3 as shown in the published manuscript https://doi.org/10.1038/s41598-021-85855-6 (Batovska et al. 2021), or here:
The right-most column (12S amplicon), lines 5 to 8 (Pool 2), are mostly pale green (R. padi). There are orange circled minus symbols indicating D. noxia false negatives on lines 6 and 8 (i.e. 250 Pool 2, and 1000 Pool 2).
I believe those should be on lines 7 and 8, i.e. 500 Pool 2 and 1000 Pool 2, show visually here with purple annotation:
Zooming in to the original PDF there does appear to be a thin orange line one line 6:
From within rstudio, having run enough of the provided R code to have the
df_21s
dataframe:My interpretation is those first two lines say Diuraphis noxia 12S was found at about 0.3% in 100-Pool-2 and about 0.1% in 250-Pool-2 (and also 100-Pool-4, 250-Pool-3 and 250-Pool-4 according to the next three lines). The rest are zero, i.e. 500-Pool-2 and 1000-Pool-2 are false negatives.
Possible evidence from
output/csv/seperateloci/
earlier in the analysis:If I am interpreting that correctly, Diuraphis noxia was found at about 0.3% in 100-Pool-2, and 0.1% in 250-Pool-2 (both runs), but not in 500-Pool-2 nor 1000-Pool-2
Same with the genus CSV file: