Shiny/GUI - Githubissues

TNRiley commented 2 years ago

Focusing on the shiny interface around the following use cases as I believe that this covers what we've talked about.

Single merged record The ability to create a single record and include prefered metadata for selected fields based on the metadata attributes (fillded or not, length) is its own unique use case. This was the "metadata enhancement" use case that was in the google doc.

Examples: IF one abstract contains text and the other does not, choose the metadata from the complete record for the final merged record. IF one records' author field contains more text and the other does not, choose the metadata from the more complete record

High level anlysis Databases/Platforms/Indexes analysis

crossover
uniqueness
(No string information included)

Examples: Databases/Database: GreenFile vs. CAB Direct vs. Aquatic Sciences and Fisheries Abstracts (ASFA) vs. Water Resources Abstracts Platform/Indexes: Web of Science- Science Citation Index Expanded (YR-YR) vs. Core Collection vs. "ALL Databases" (YR-YR) Internal Publisher : ProQuest - (ASFA) vs. ProQuest Earth - Atmospheric & Aquatic Science Datab (EAAS) Search Engine/Database: Google scholar vs. ASFA OR Web of Science etc...

Mid level Single database - Multi String/strategy analysis

compare search results against known seed articles/post title abstract/final included articles
(there is a lot of potential to analyze best cases in string development, etc. using this)

Example: ASFA Search 1 vs Search 2 vs Search 3

Deeper level (this is an area I have a hard envisioning specific use cases for and see as very niche) Multiple database - multiple string Same use cases as mid level, but the ability to analyze across databases as well

Examples: ASFA string 1 vs EAAS string 1 ASFA string 1 vs EAAS string 2 ASFA string 3 vs EAAS string 2 ASFA string 1, 3 vs EAAS string 2

Summary data for flow diagrams

number of results
sources of databases
duplicates removed
number of unique removed vs. crossover (I'm sure there could be some cool research question as to quality and availability?)
etc.

rootsandberries commented 2 years ago

Also a note to add to the 'mid-level' analysis: I think there is a case here for comparing screening sets (i.e. final included, title/abstract, excluded etc.) to not just database searches but also RIS files from supplementary searching like citation chasing, hand-searching, etc. Really a minor differentiation as I imagine these would just be noted upon upload, in the same way database would be, but just want to make sure the thinking is inclusive of all sources, not just database searches.

TNRiley commented 2 years ago

yes, comparing the use of other methods and tools would be great, I had it in my head but it's good to make sure it's included as an example! I had included Google scholar as an example in high level. And yes I think it would just be noted as whatever name you give it upon upload. This would also be used the case when loading seed articles OR final articles vs. Initial .ris if you wanted to look at a complete project.

nealhaddaway commented 2 years ago

Great point Sarah. That’s also why I think it’s better to refer to ‘source’ and not prescribe database/platform/string etc @chriscpritchard - some blocks of records might come from not db searches.

LukasWallrich commented 2 years ago

Considering these use cases, would it be important to have hierarchical labels, so that you could compare (e.g.,) databases vs other sources at one level and then drill deeper? Or is it sufficient if comparison takes place at one (user-defined) level of granularity?

Having the ability to compare multiple levels would maybe also be a good way to see - at a glance - where the original hits vs the included articles came from?

nealhaddaway commented 2 years ago

I think the best approach is not to specify what the labels are because there are no clear definitions of what 'databases' are amongst users. People think platforms are databases, database uses differ depending on platforms, etc.

The best approach I think is to use user-define labels for whatever they think a 'source' is, and let them add complex information in additional fields (as I suggest in the JSON embedded in the RIS) to let them interpret the outputs we give.

You can't replicate search strings across many databases because the syntax isn't the same, but more than that, different platforms accessing the same database have different search options, it just gets really messy. I say let the user interpret what the labels mean, but don't get stuck in the mud of it.

TNRiley commented 2 years ago

I think the best approach is not to specify what the labels are because there are no clear definitions of what 'databases' are amongst users. People think platforms are databases, database uses differ depending on platforms, etc.

Yes

The best approach I think is to use user-define labels for whatever they think a 'source' is, and let them add complex information in additional fields (as I suggest in the JSON embedded in the RIS) to let them interpret the outputs we give.

I agree with this. People can record a simple WoS 1.1, WoS 1.2, WoS 1.3, ASFA 1.1, etc... I believe that users who are documenting will have that data in their record of searches/strings/results, etc.

I don't think the complex data belongs in the .ris itself. I still have a hard time envisioning JSON embedded into the .ris, but honestly if Neal thinks that has applications - I'm on board and want to learn more about how this looks and what could be done.

TNRiley commented 2 years ago

Considering these use cases, would it be important to have hierarchical labels, so that you could compare (e.g.,) databases vs other sources at one level and then drill deeper? Or is it sufficient if comparison takes place at one (user-defined) level of granularity?

Do we think that using a string version name for the Source Name would be able to solve this? WoS 1.1, WoS 1.2, WoS 1.3, ASFA 1.1, etc..

I'm trying to think of an example where 2 customizable fields might come into use... would it make things easier for visualization possibly? Source 1 = WoS Source 2 = 1.1

Having the ability to compare multiple levels would maybe also be a good way to see - at a glance - where the original hits vs the included articles came from?

I think that this could still be accomplished by naming each Source such as WoS 1.1, in the end you could look back and see which .ris the final record was associated with. as the custom Source metadata would be merged from each of the overlapping records.

TNRiley commented 2 years ago

Coming back to this as I was trying understand what the visual would look like when tracking the number of overlapping/unique citations from a source over time (initial .ris -> post screening .ris -> final included .ris)

I think that I have now convinced myself that the 2 custom inputs (Source and Tag) -

Example: A user uploads initial .ris for 4 different databases WoS, ASFA, LENS, PMC These are all named accordingly on for the Source (you could even have WoS 1.1, 1.2, etc. for unique strings) They are Tagged in the second field as "initial results" (or however the user wants) The user could then upload the post screened .ris and the final included .ris files These Tagged in the second field accordingly as "Screened" and "Final" (as an example) These can be left withthout a Source name as this is not needed (CiteSouce will identify the matching citations)

This could produce a visual for to show the number and percent of Crossover/Unique records for each source and how that number changed throughout the process. Final inclusion numbers can be compared to initial .ris numbers to analyze string efficacy as well as database efficacy and impact.

Hoping this makes sense!

nealhaddaway commented 2 years ago

Yeah I like this idea - I think it makes sense. We would be able to avoid asking the user to add the tag if we have a separate file upload. So an option to upload an RIS of different stages and label those stages.

TNRiley commented 2 years ago

Home/About Load Visualize-Source Visualize-Source/Tag Export

nealhaddaway commented 2 years ago

Thanks, Trevor. I've got a working mockup here. Not much functionality so far but it describes the records in your uploaded RIS files.

This is a shiny branch: https://github.com/ESHackathon/CiteSource/tree/shiny and a working app here: https://estech.shinyapps.io/citesource/

If you want to draft some explanatory text, that would be cool! Maybe we also need some graphics, a hex/logo...?

TNRiley commented 2 years ago

Great. Kaitlyn and I talked this moring and I ran some of the wireframes by her in order to talk things out a bit. I can add explanitory text for these as well as for the Home/About page. I just uploaded the "visualize" tab too

TNRiley commented 2 years ago

Home/About

I based this on the AIR EGM shiny about page because I thought that it looked nice and clean. I think having the ability to add some buttons on the right side like it does for Use Cases, etc. helps keep it clean

TNRiley commented 2 years ago

Load

Pretty simple with ability to add a source and a tag for each .ris uploaded. Below you will see the "Uploads" section so that the user can verify the upload, source, tag, and number of records.

Kaitlyn and I thought that "Merge records" would be good for the button, but happy to hear thoughts

TNRiley commented 2 years ago

Visualize-Source

Here the user has the ability to select from the ability to analyze sources by their Source, Tag, or a combination Source & Tag

I think the visualizations for the "Source" and "Tag" options as being the same, just focused on that particular data.

Users will also be able to select which visuals they want to see (thinking about this now those might want to be radio buttons rather than check boxes, I guess it depends on how everything would display)

The visualizations for the Source/Tag option is where there might be different use cases and I'd love to hear more about potential visualizations that could be empoyed.

nealhaddaway commented 2 years ago

Hi Trevor, I think I'm missing where the wireframes are being uploaded - is there a link somewhere?

TNRiley commented 2 years ago

Sorry, I have been editing and linking them to the original comment from about an hour ago

nealhaddaway commented 2 years ago

Bingo: https://github.com/ESHackathon/CiteSource/issues/17#issuecomment-1048875830

Thanks, sorry. I'll start editing :)

TNRiley commented 2 years ago

Bingo: #17 (comment)

Didn't know I could do that, I'm learning about github ;)

nealhaddaway commented 2 years ago

Haha me too - I was just guessing... it worked ;)

So here's a question - do we want people to upload and label files one by one, or upload in bulk and then label their uploads? I get the feeling the latter is more streamlined because it can be prefilled with defaults based on the file contents, but the former is certainly programmatically easier lol

TNRiley commented 2 years ago

#17 (comment)

Visualize-Source/Tag Added one bar chart option where a user could visualize the change in crossover/unique at different stages using both Source/Tag

TNRiley commented 2 years ago

So here's a question - do we want people to upload and label files one by one, or upload in bulk and then label their uploads? I get the feeling the latter is more streamlined because it can be prefilled with defaults based on the file contents, but the former is certainly programmatically easier lol

I think for workflow I would (myself) prefer to add Source and Tag as I import them. If we did do a bulk upload with multiple .ris we would need to add a filename collumn in the upload section. I'm trying to envision the process of renaming once they are in there. ----Could we do something like IF the source tag is left empty it will be filled with the filename? That might work for bulk. ..

nealhaddaway commented 2 years ago

Cool yeah we can force one-by-one file upload for sure! I can then add fields to be added ('source' and 'tag' label).

Just wondering how to make sure we build in options for selecting a field from the upload or extracting it from the embedded data... I'm trying not to build the shiny in a way that restricts use...

TNRiley commented 2 years ago

Just wondering how to make sure we build in options for selecting a field from the upload or extracting it from the embedded data... I'm trying not to build the shiny in a way that restricts use...

I was thinking that the pull of pre-entered metadata from a field into Source/Tag might not have been ready for the minimum viable product - but I can draw up what that might look like as well if it helps. Is it easier to build it and if that functions isnot ready just say that? or to add it later when it is functional?

TNRiley commented 2 years ago

Added the Export page.

I tried to provide the most flexibility here so that folks can select slices of data if they wanted, both in the Source/Tag options as well as in the crossover/unique. I also want to make sure that folks can export the raw Source/Tag .ris that has not been deduplicated - this will give us a hand in testing things as well.

nealhaddaway commented 2 years ago

Yeah true - I'll just think about an option for pulling it in automatically. Probably best to offer that to the user if a 'search_record_start' tag is found from the JSON file I've been talking about, or an option for a user to select a field. Yup

nealhaddaway commented 2 years ago

The current shiny mockup has a single upload option with source and tag, you can see the summary added to the datatable and the refs are read into a reactive value inside the app. :)

https://estech.shinyapps.io/citesource/

nealhaddaway commented 2 years ago

@chriscpritchard @LukasWallrich @kaitlynhair @DrMattG

How is it best to store the uploaded data in the Shiny app?

I could either save each input as a list - list(data=data, source=source, tag=tag), and make a list of lists, or I could build a single dataframe with source and tag as two additional columns? I'm thinking the latter is easier...?

kaitlynhair commented 2 years ago

The latter is probably simpler... at least for me to work with!

nealhaddaway commented 2 years ago

Sweet - I hate lists anyway...

Matt and I are getting the df upload sorted and I'm gonna work on getting the Shiny tabs ready for the viz. It's all in the shiny branch FYI

TNRiley commented 2 years ago

CiteSourcehex

rootsandberries commented 2 years ago

Did you design this logo Trevor?? It's awesome!

TNRiley commented 2 years ago

Yes! Found this and couldn't not play around ;)

I might play around with it a little, but am glad you like it!! - Seeing a hex for the project makes it feel even more real.

nealhaddaway commented 2 years ago

OK the shiny app draft is working now: https://estech.shinyapps.io/citesource/ I'd forgotten to add library(markdown).

Nicely now, you can just draft the formatted test you want in each section here and I can copy the markdown out of this issue and into a markdown file to populate each text section. That keeps the shiny app nice and clean.

kaitlynhair commented 2 years ago

Just an update on the app: I've added in basic deduplication (from ASySD) and source comparison functionality to the shiny branch - had some discussion with the Trevor and Lukas this morning about improving the user experience / adding loading bars etc. Hoping this will help us visualise the workflow and highlight what we still need to work on within the R package!

TNRiley commented 2 years ago

Closing due to broad categorization will create issues around specific work related to shiny

ESHackathon / CiteSource

Shiny/GUI #17