ESHackathon / CiteSource

http://www.eshackathon.org/CiteSource/
GNU General Public License v3.0
16 stars 2 forks source link

plots: source "unknown" #135

Closed TNRiley closed 1 year ago

TNRiley commented 1 year ago

Plots now show an "unknown" source for citations that do not have a cite_source, such as citations that are screened or final. This label is being applied in the deduplication process. I'm not sure if it is better to filter these out in each plot or to make a change to the deduplication function.

LukasWallrich commented 1 year ago

Good question. ASySD now adds "unknown" in place of missing values (here). To me, that seems problematic since it does not allow users to distinguish between NAs (which might indicate all kinds of things, like not applicable) and actual explicit unknowns.

If @kaitlynhair wants to keep this in ASySD, it would seem to me that we should replace NAs with something else in CiteSource (e.g. CITE_TRUE_NA) and then remove that value again afterwards - otherwise, we prevent users from using unknown, which might well be a desirable label (for instance, when label is used for sth like pre-registered).

TNRiley commented 1 year ago

talked with @kaitlynhair about this. She wanted to essentially avoid the sources or labels with blanks coming out like "WoS, , PubMed" or "WoS, PubMed" when three sources had been merged together. However, I'm not sure that this is an issue in CiteSource as the will often be times that screened/final files will be added without a source.

Kaitlyn is open to changing it to NA, but I also suggested NULL since there is no value and no expected value. @LukasWallrich what are your thoughts on either NA or NULL. This will also affect cite_lable

kaitlynhair commented 1 year ago

This could also be one of those things that make sense in ASySD but not in CiteSource. ASySD is purely for systematic searches, but because we want to look at different stages in CS, it may not make sense to have this at all. It may make sense for a "final" citation to have only "PubMed, "WOS" and not "PubMed, WOS, NULL". ?

LukasWallrich commented 1 year ago

Good point - though I could see similar uses for Asysd on other fields. Would it make sense to just add an option "replace_na" to Asysd that can default to "unknown"? If it is "", we need to replace ", ," with ", " ... ideally in Asysd as that format seems never desirable?

kaitlynhair commented 1 year ago

Added an argument to ASySD show_unknown_tags - this is now set to FALSE in CS and resolves our plot issues