ESHackathon / CiteSource

http://www.eshackathon.org/CiteSource/
GNU General Public License v3.0
16 stars 1 forks source link

Total search summary table #66

Closed LukasWallrich closed 1 year ago

LukasWallrich commented 1 year ago

@TNRiley to define how to report source contributions, sensitivity & accuracy in a table

@LukasWallrich to implement

TNRiley commented 1 year ago

Here is what I have so far. I'll add a few more posts to this in order to lay out which metadata fields should be included as well as a few more calculations.

Yield Unique = number of references retrieved by a source that was not found in any other source Yield Crossover = number of references retrieved by a source/method which were found in at least one other source Total Yield= total number of references retrieved by a source/method

Potentially Relevant = total number of references included after title/abstract screening Relevant = total number of references included after full-text screening

Sensitivity/Recall (source level) = Total number of references included from a source (unique & crossover) over the total number of references included across all sources. Example: WoS had 15 articles that were included after screening. There were 80 total articles included after screening, sensitivity would be 18.75 (This could be calculated after both the TI/AB screening as well as after the full-text screening... the full-text screening calculation makes the most sense, however, there may be potential value in understanding recall after both phases)

Sensitivity/Recall (total) = Total number of references included across all sources over the total number of articles screened (again this can be calculated after both TI/AB screening and final full-text screening)

Precision (source level) = number of references included from a database over the total yield of that database. Example: WoS had 15 articles that were included, WoS total yield was 780 articles, precision would be .77

Precision (total) = number of references included across all sources over the total yield across all sources. Example: WoS had 15 articles that were included, ASFA had 31, and Scopus had 4. The total yield across all sources was 1560. Precision would be 3.2

LukasWallrich commented 1 year ago

Thanks - very helpful!

Only the following is unclear to me - sounds a lot like precision (total). Could you clarify?

Sensitivity/Recall (total) = Total number of references included across all sources over the total number of articles screened (again this can be calculated after both TI/AB screening and final full-text screening)

LukasWallrich commented 1 year ago

I now created a first version of the table - see here: https://www.eshackathon.org/CiteSource/articles/citesource_example.html#step-4a-review-a-summary-table

Can you pls have a look and let me know what else would be helpful / if you have particular formatting ideas?

TNRiley commented 1 year ago

Thanks - very helpful!

Only the following is unclear to me - sounds a lot like precision (total). Could you clarify?

Sensitivity/Recall (total) = Total number of references included across all sources over the total number of articles screened (again this can be calculated after both TI/AB screening and final full-text screening)

I'm also adding the content below as it was something I was working on to help myself keep all these straight in my head! I've renamed things slightly to help make it a bit more logical. Let me know if this helps.

I'll take another look and may reach out to Alison too in order to get confirmation that these are correct. There are also a few calculations in the source level that are not part of the search summary table as it's published, but that would be useful to provide.

Post-search references will include Yield_Search = total number of references gathered in the search Yield_Unique = total number of references gathered that were unique to a single source/method Each source will have three numbers when comparing results across sources/methods (Cite_Source)_Yield_Search = number of references retrieved from a source/method (Cite_Source)_Yield_Unique = number of references from a source that were not found in another source/method (Cite_Source)_Yield_Crossover = number of references from a source/method that were found in at least one other source/method

Screened references will include Yield_Screened = total number of references included after title/abstract screening Yield_Unique_Screened = number of references included after title/abstract screening that were unique Yield_Final = total number of references included after full-text screening Yield_Unique_Final = number of references included after full-text screening that were unique

Source Level Calculations Sensitivity/Recall = (Cite_Source)_Yield_Final / Yield_Final Precisions = (Cite_Source)_Yield_Final / (Cite_Source)_Yield_Search

Search Contribution= (Cite_Source)_Yield_Search / Yield_Search = Percent of references a source/method contributed at search Unique Contribution = (Cite_Source)_Yield_Unique / Yield_Unique = Percent of unique references a source/method contributed Unique Sensitivity/Recall = (Cite_Source)_Yield_Unique_Final / Yield_Final = Percent of relevant references a source/method contributed that otherwise would not have been found. I have not seen papers that work to show this particular calculation, but it would be used to determine the overall importance/impact of a source/method.

Total Calculations Sensitivity/Recall = SEE COMMENT BELOW Precision = Yield_FInal / Yield_Search

TNRiley commented 1 year ago

Looking at Total Sensitivity/Recall, the way it's calculated is Yield_Final (for a method) over the Yield_Final (for all methods) Calculating this total is possible if a user were to use Cite_String to classify Sources as a particular method (Handsearching v. DB searching v. citation chasing v. Organizational web site searching v. etc.). Strings could also in fact be considered methods, if a single database was being evaluated and the user wanted to validate a hedge. However, if only one method was used, this would not be N/A.

The working example long vignette would actually be a good vignette to use for this as there were multiple methods that they used (database searching, previous related map citation chasing)

LukasWallrich commented 1 year ago

That makes sense - and seems equally applicable to sources? I.e. share of all final results found in a specific source?

On Thu, 24 Nov 2022, 16:39 Trevor Riley, @.***> wrote:

Looking at Total Sensitivity/Recall, the way it's calculated is Yield_Final (for a method) over the Yield_Final (for all methods) Calculating this total is possible if a user were to use Cite_String to classify Sources as a particular method (Handsearching v. DB searching v. citation chasing v. etc.)

— Reply to this email directly, view it on GitHub https://github.com/ESHackathon/CiteSource/issues/66#issuecomment-1326605616, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOK6NGLAR3RZTYBZEJ2YSOTWJ6D3DANCNFSM6AAAAAASDO2YFQ . You are receiving this because you were assigned.Message ID: @.***>

LukasWallrich commented 1 year ago

Trying to keep track of all of this, it seems that only the following needs to be done to complete the current version:

@TNRiley would your final comment just be about replacing source with method (i.e. string)? Then no change is needed. If you think about nesting methods within sources, then that sounds very complicated - and I would leave it up to the user to simply combine sources and methods into one string and then manually adjust the parts of the table that need to reflect the hierarchical structure? Or what would you suggest is needed here?

TNRiley commented 1 year ago

Sensitivity and recall are used interchangeably, I personally prefer recall. We can label it sensitivity/recall, which is normal. Anyone who is needing this should know either way IMHO.

rootsandberries commented 1 year ago

This is looking great! In regards to sensitivity vs recall, maybe just 'sensitivity' to declutter the table a bit and align with the Bethel paper on SSTs? Also, I'm wondering if 'crossover' is something that is really needed--the term 'crossover' is not intuitive to me, and since we are given both the number and percentage of unique records, it seems somewhat redundant. Again, just thinking of how to simplify the table.

LukasWallrich commented 1 year ago

Thanks @rootsandberries - I have simplified the table accordingly. We should document the values of the table. Could you maybe write brief table notes explaining what the columns are about? Then I would add them.

rootsandberries commented 1 year ago

I will do my best to write some explanations...I need to get my head around these different parameters myself, and writing some table notes will be a good way to do that :-)

rootsandberries commented 1 year ago

I've had a hard time understanding the current version of the table. For what it's worth, I'm providing another sketched out table format that aligns pretty well with the Bethel paper, and provide associated definitions below. It's much simpler, but as a result loses a lot of the potentially informative information citesource could provide. I'm by no means suggesting we replace the other one....but thought I would just put this here as an idea. I didn't generate this with code, hence there are some made up numbers here.

`

Effectiveness summary

  Search (Unique Contributions) Final (Unique Contributions) Sensitivity Precision
WoS 219 (50) 15 (5) 93.75% 6.85%
Lens 175 (23) 12 (3) 75.00% 6.86%
Dim 132 (18) 2 (0) 12.50% 1.52%
Total 526 29 100.00% 6.11%
Deduplicated 262 16 - -

Definitions:

Search: The number of records retrieved by the search

Unique contributions: The number of unique records contributed by the source that were not found in other sources

Final:: The number of records included after screening titles and abstracts

Sensitivity: This calculation refers to the Final references and is the number of final references identified by the database relative to the total deduplicated number of final references found by all search methods.

Precision: Number of final references identified by the database relative to the total number of references found by that database.

`
LukasWallrich commented 1 year ago

Thanks - this looks cleaner, and definitely shows that we need to improve the clarity of the table.

However, I think it would be worth showing the unique contribution of each source more explicitly ... whether that is the unique contribution % shown in the current version or something more concrete - e.g., the number needed to screen to find a unique contribution - but it would be great to see directly which sources add the most value, and which are most efficient at doing so. Any ideas?

Regarding the layout, we need to decide on the orientation. I had the stages and sources in rows so that the table can grow easily when more stages (e.g., screening steps) or more sources are considered. For a single table, your approach is more visually appealing - but we will struggle to get it to look nice both for 3 sources over 2 stages and 10 sources over 4 stages. Do you think the 'long' version can work if it is more clearly displayed? What do you find most unclear?

An advantage of the long version might also be that we can show sensitivity for all stages, and precision for all but the first. Do you think that is useful information?

rootsandberries commented 1 year ago

Yes, I totally agree uniqueness needs to be more informative than what I've proposed. I think I'm having a hard time making sense of the current unique contribution numbers in the table, mainly because I don't understand where the values under final/records are coming from (like 29 vs. 5 etc.). The way I'm thinking about it is in the form of some sort of 'uniqueness index'...so something that would be zero for no unique contributions and one for only unique contributions, where unique contributions refers to contributions to a screened or included set--a ton of unique contributions, none of which get included, would be a bad thing. But if it could also take into account the proportion of total results to unique results, that could get at efficiency. So maybe:

The problem is that the number, under typical circumstances, will always be really small. I'm not very familiar with the design of indexes, but maybe there's a way to design a good 0 to 1 index for something that is comparing very small to very large numbers....

I'm sure I'm overthinking this, and probably just proposing something that Trevor has already proposed above but in a different more confusing way :-)

As for the design of the table, yes, I see your point about rows...so something more like below. I do think there's value in sensitivity and precision values for different stages...that makes sense. I was a little confused about the sensitivity values in the current table...seems like it should be total found in the database relative to total records after deduplication, but it seems like its currently calculated against the un-deduplicated number. Or maybe I'm wrong about that reasoning!

` Source Yield (Unique Contributions) Sensitivity Precision
Search
WoS 219 (50) 83.58% -
Lens 175 (23) 66.79% -
Dim 132 (18) 50.38% -
Total 526 - -
Deduplicated 262 - -
Final
WoS 15 (5) 93.75% 6.85%
Lens 12 (3) 75.00% 6.86%
Dim 2 (0) 12.50% 1.52%
Total 29 100.00% 6.11%
Deduplicated 16 - -
LukasWallrich commented 1 year ago

Yes, that makes sense. I quickly worked on the formatting to make the table somewhat clearer. I will give these edits a go, and think about how to include the deduplication - that is missing, and the precision is thus off. I would keep yield and unique contribution in two columns so that the numbers can be aligned - makes reading a bit easier?

LukasWallrich commented 1 year ago

I now made another attempt at this - with fewer columns, clearer formatting and table notes based on your suggestions @rootsandberries. You will notice that I edited the notes quite heavily to an expression of % that I find clearer ... but very happy to go with yours if you prefer. Can you have a look?

I also added an option to only show the top-x sources ... otherwise, the tables can get very long (as shown in the long example) and the main focus might well be on comparing the large sources.

The function now also works for cite_strings ... not sure if we have an example where that would be relevant. Mostly for testing, I now added it to the long vignette - since I don't understand what the strings there represent, I can't judge whether it is interesting - please delete if not.

LukasWallrich commented 1 year ago
LukasWallrich commented 1 year ago

Further options on hold - @TNRiley will reach out to Alison

TNRiley commented 1 year ago

I've been rolling this over for a bit and looking at the current search summary table. I found it pretty confusing in my own analysis and think that we should first create a master table with as much information as possible. The naming of record types is also something that I've given a lot of thought to. Here is what I've come to on this in terms of what the table should include for each source. The information in this table is only for the initial deduplication process.

FOR EACH SOURCE Raw citations = # of citations exported from a source (or assigned to a source). Database-specific unique records (D-SUR) = the total number of citations from a database after internal deduplication (this number is currently listed for each database under the total column)... ( will equal non-unique + unique) (Source)Non-unique = number of records from a source that was found in at least one other source (Source)Unique = number of records that were found in only that source (currently listed under the unique column)

AS A TOTAL Total Raw = total number of all records uploaded before any internal or cross-source deduplication D-SUR Total = total number of records after internal source deduplication Total Non-unique = total number of records found in two or more sources Total Unique = total number of records that were unique to a single source Total Records = total number of records after both internal deduplication and cross-source deduplication (this would be the true number of records a team would need to screen at title and abstract)

Further Calculations Initial Contribution % = D-SUR / D-SUR Total Unique Contribution % = Unique / Total Records Percent of Unique = (Source)Unique / Total Unique

LukasWallrich commented 1 year ago

Currently, I can't find a way to show raw citations (due to #119) ... so need to put this on hold until that is fixed (if we agree on the second comment, I can create a workaround quickly, but I don't want to touch the deduplication code while Kaytlin is working on it).

TNRiley commented 1 year ago

Currently, I can't find a way to show raw citations (due to #119) ... so need to put this on hold until that is fixed (if we agree on the second comment, I can create a workaround quickly, but I don't want to touch the deduplication code while Kaytlin is working on it).

I have raw citations on the new table I built. See the plot sandbox. I don't think the page is building due to some unresolved CMD issues, but if you run the script locally it'll work

TNRiley commented 1 year ago

I think I've got it done. Just pushed this to the plot sandbox. Here is what it looks like. image

TNRiley commented 1 year ago

changed this from the DT to gt package and added footnotes. I also changed D-SUR to S-SUR as these are Sources and not always Databases. I think this looks pretty polished. I think this could serve as the search summary table now. A second table could focus on how the numbers change across screening.

image

LukasWallrich commented 1 year ago

This looks great, well done! Would if make sense to add sensitivity and specificity here?

Re the other table, should we keep the format of the current summary table and actually simplify it - just show the number of unique and duplicated citations, and sensitivity and specificity? Then users can always go back to (filtered) version of your table to see everything at a given stage.

All the best,

Lukas

On Fri, 12 May 2023, 12:38 Trevor Riley, @.***> wrote:

changed this from the DT to gt package and added footnotes. I also changed D-SUR to S-SUR as these are Sources and not always Databases. I think this looks pretty polished. I think this could serve as the search summary table now. A second table could focus on how the numbers change across screening. [image: image] https://user-images.githubusercontent.com/89118428/237960074-53ab80f2-1a2b-481f-a0a9-70398f9a9fc1.png

— Reply to this email directly, view it on GitHub https://github.com/ESHackathon/CiteSource/issues/66#issuecomment-1545607790, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOK6NGMFWSFH3RPY2OSLJGDXFYOLRANCNFSM6AAAAAASDO2YFQ . You are receiving this because you were mentioned.Message ID: @.***>

TNRiley commented 1 year ago

I think that with this table we can now make a pretty simple table to show the precision/sensitivity. I'm going to try and take a shot at it, but in essence it would have 5 columns and then however many rows based on the number of sources. The 5 columns would be S-SUR (which is still poorly named, but represents the number of records from a source after internal deduplication), Relevant Revords (which would be the number of records included after title and abstract screening), Included records (which would be the number of records included at FT screening), Sensitivity/Recall, and Precision.

I don't think we need to break things down by screened and final the way we have it on the current search summary table because I actually think that the precision/sensitivity doesn't make much sense at the screened phase and obviously doesn't make sense for the search phase.

I'll try to put something together based on the new summary table, but I'll need some assistance in integrating the functions into the plots.R since I have yet to do that. Maybe at next week's meeting we can walk through that live?

TNRiley commented 1 year ago

@LukasWallrich how does this look as a second table? I need to reach out to Alison to have her review these two. I want to get clarification on the total precision and total sensitivity/recall. I've added this to the plot sandbox as well. Obviously, we are still running into the "unkown" issue with this, but that will go away when it is resolved.

image

TNRiley commented 1 year ago

image

After reviewing this I see a couple of issues. The screened included and final included are obviously the totals, however, these are totals of the items that were both unique and duplicated across sources. This means that the total is not accurate, it's really a count of how many times all the citations that were included after screening were found.

Maybe the answer here is that we just don't include a total for these columns? Otherwise, we'll need to pull this number from somewhere else and footnote it to explain. Not sure.

LukasWallrich commented 1 year ago

This looks good! Also, good catch re the totals - they should ideally be the correct totals. I would find it odd to have blanks there, but in a first version that would also be ok.

Also, I just noticed that we have recall at the search stage, just called source contribution, so reporting it only once here makes sense.

S-SUR is the only thing I don't like about the tables - can we just call it records in the second table and add a footnote saying that this is after internal deduplication? In the first, maybe call it distinct records?

Happy to talk you through the integration live next week - have a good weekend!

TNRiley commented 1 year ago

Love the suggestions! I've updated the first two tables and improved the footnotes and updated some of the code where I had S_SUR_Count to make it distinct_count image image

TNRiley commented 1 year ago

Fixed this last one up as well, the screened and final record numbers are now accurate and I've added a footnote. I've also added a line to remove the unknown labeled records until it gets resolved. image

TNRiley commented 1 year ago

added new count functions and table functions.