ESHackathon / CiteSource

http://www.eshackathon.org/CiteSource/
GNU General Public License v3.0
16 stars 1 forks source link

Vignette #27

Closed rootsandberries closed 1 year ago

rootsandberries commented 2 years ago

I added a vignette folder and file. I think I'm still not clear on what we've settled on in terms of importing, etc. but is this general workflow looking reasonable?

About the package

CiteSource provides users with the ability to deduplicate references while maintaining customizable metadata. Instead of the traditional deduplication method where records are removed and only one record is selected to be retained, CiteSource retains each duplicate record while merging metadata into a single main record. This main record maintains user-customized metadata in two fields, "Source" and "Tag". In the merging process, select metadata fields are also automatically compared (currently DOI & Abstract) and the most complete metadata is used in the main record.

Installation

Use the following code to install CiteSource.

Import files from multiple sources

Currently, users can import multiple RIS files into CiteSource, which will be labelled with source information such as database, platform, and a search ID. The latter can be used to specify search parameters.


my_records <- read_citations(c("asfa.ris", "econlit.ris", "greenfile.ris"),
                             database = c("ASFA", "EconLit", "GreenFILE"),
                             plaform = c("ProQuest", "EBSCO", "EBSCO"),
                             search_ids = c("Search1", "Search2", "Search3"))

Deduplicate while maintaining source information

CiteSource allows users to merge duplicates while maintaining information ...


unique_citations <- dedup_citations(my_records)

data_sources <- source_comparison(unique_citations)

Source or method analysis

When teams are selecting databases for inclusion in a review it can be extremely difficult to determine the best resources and determine the ROI in terms of the time it takes to apply searches. This is especially true in environmental research where research is often cross-disciplinary. By tracking where/how each citation was found, the evidence synthesis community could in turn track the efficacy of various databases and identify the most relevant resources as it relates to their research topic. This idea can be extended to search string comparison as well as strategy and methodology comparison.

Plot overlap as a heatmap matrix


my_heatmap <- plot_source_overlap(data_sources, 
                                  plot_type = "percentages")
my_heatmap 

Plot overlap as an upset plot


my_upset_plot <- plot_source_overlap_upset(data_sources)
my_upset_plot

Review stage analysis

Once the title and absract screening has been complete OR once the final papers the final literature has been selected, users can analyze the contributions of each Source/Method to better understand its impact on the review. By using the "Source" data along with the "Tag" data, users can analyze the number of overlapping/unique records from each source or method.

Assess contribution of sources by review stage

Documentation and output

Generate a search summary table

Export deduplicated files

LukasWallrich commented 2 years ago

@rootsandberries I was just trying to get the vignette (citesource_vignette.rmd) to compile and can't find the data that it relies on. Could you point me towards it? Or is the data_v4 folder something you only have locally? If so, could you add it into the vignettes folder?

rootsandberries commented 2 years ago

I just updated the vignette and added the new data we're using. I don't know if this has been raised yet, but when we were testing these new files, we noticed an odd problem (that I'm now realizing was also occurring when I ran my sample files). When the charts are created there is a group of records that seem to be untagged with a source name (just called source_). It's unclear where they coming from, but they appear to have no other metadata associated with them. When I looked at the various dataframes that are created along the process, I noticed a bunch of rows of NAs appearing in the n_unique file.

I'll create a new issue for this. Curious to know if you see the same.

LukasWallrich commented 2 years ago

Great thanks - good catch & let's continue that discussion in the new issue.

I have now added the export functions to the bottom of your vignette - could you check that they do what you expect, and maybe add some narrative?

rootsandberries commented 2 years ago

I've updated the vignette to describe the new export features. However, the RIS export was not importing into Zotero. I'm getting this message from Zotero: The selected file is not in a supported format.

Otherwise, I think the .csv file looks good!

LukasWallrich commented 2 years ago

This RIS issue is very odd - and not directly related to our extra fields, it should also happen when we only use synthesisr (and it works when we only use WoS.ris and WoS_Early.ris). I have now narrowed it down to one RIS entry that imports and one that doesn't - don't have the time tonight to identify the tags that are to blame, but you might be faster when it comes to spotting which might be non-standard/problematic anyway.

Import fails

DB  - McK
C8  - 
C7  - 
PY  - 2006
TI  - Population structure and habitat use of baboons (Papio hamadryas ursinus) in the Blyde Canyon Nature Reserve
SP  - 67-76
AB  - NA
TY  - JOUR
SO  - Koedoe
VL  - 49
IS  - 2
N1  - Cited By (since 1996):2 Export Date: 6 January 2014 Source: Scopus
AU  - Marais, A. J.
AU  - Brown, L. R.
AU  - Barrett, L.
AU  - Henzi, S. P.
UR  - http://www.scopus.com/inward/record.url?eid=2-s2.0-34248546137&partnerID=40&md5=c05c14faa0fd4d020171d5ef8af3bdb7
KW  - Chacma baboons
KW  - Habitat use
KW  - Home range
KW  - Mpumalanga
KW  - Population structure
AD  - Applied Behavioural Ecology and Ecosystem Research Unit, University of South Africa, Pretoria, South Africa Private Bag X6, Florida 1710, South Africa Department of Psychology, University of Central Lancashire, Lancashire, United Kingdom
JO  - Koedoe
ER  - 

Import succeeds

DB  - WoS
AB  - NA
TY  - JOUR
AD  - {NEWBURY HOUSE, 900 EASTERN AVE, NEWBURY PARK, ILFORD, ESSEX IG2 7HH, ENGLAND}
PY  - 2002
TI  - Land tenure systems and their impacts on agricultural investments and productivity in Uganda
SO  - Journal of Development Studies
VL  - 38
IS  - 6
SP  - 105--128
DO  - 10.1080/00220380412331322601
AU  - Place, F.
AU  - Otsuka, K.
JO  - Journal of Development Studies
BN  - 0022-0388
ER  - 
LukasWallrich commented 1 year ago

@TNRiley @rootsandberries Just to note - I automatically reformatted the code in the package to be in line with tidyverse conventions. That should make it easier to read, particularly in the .R files. If anything in the vignettes looks odd now, then I'm sorry - feel free to revert ...