ClinGen / gene-and-variant-curation-tools

ClinGen's gene and variant curation interfaces (GCI & VCI). Developed by Stanford ClinGen team.
https://curation.clinicalgenome.org/
MIT License
3 stars 1 forks source link

Allow users to download filtered .csv files #252

Open cgpreston opened 1 year ago

cgpreston commented 1 year ago

When a user filters on any of the following pages:

Screen Shot 2022-09-27 at 3 01 23 PM
liammulh commented 1 year ago

@cgpreston, you wrote:

The "Download (.CSV)" button should transition to allow the users to download the filtered results (such as seen in image)

Do you mean (1) the text of the button should change when there is a filter to indicate to the user that they are downloading the filtered results rather than all of the results or (2) that currently the user is unable to download filtered results?

I checked the "My Affiliation's Variant Interpretations" and "My Affiliation's Gene-Disease Records" tables, and it looks like I am able to download filtered results.

Checked for differences between the files

My Affiliation's Variant Interpretations

  1. Navigated to /dashboard with my affiliation set to "Hearing Loss".
  2. For the "My Affiliation's Variant Interpretations" table, clicked the "Download (.CSV)" button on with no filters.
  3. Renamed the download to var-dashboard-tables-export-1.csv.
  4. Added the "Provisional" filter.
  5. Clicked the download button again.
  6. Renamed the second download to var-dashboard-tables-export-2.csv.
  7. At the command line, checked to see if the files are the same: diff var-dashboard-tables-export-1.csv var-dashboard-tables-export-2.csv. Since the command line tool diff had output, it seems the two files are different. (If they were the same, there would be no output.)

My Affiliation's Gene-Disease Records

  1. Navigated to /dashboard with my affiliation set to "Hearing Loss".
  2. For the "My Affiliation's Gene-Disease Records" table, clicked the "Download (.CSV)" button on with no filters.
  3. Renamed the download to gene-dashboard-tables-export-1.csv.
  4. Added the "Provisional" filter.
  5. Clicked the download button again.
  6. Renamed the second download to gene-dashboard-tables-export-2.csv.
  7. At the command line, checked to see if the files are the same: diff gene-dashboard-tables-export-1.csv gene-dashboard-tables-export-2.csv. Since the command line tool diff had output, it seems the two files are different. (If they were the same, there would be no output.)

Checked number of lines in each file

File Number of Lines
var-dashboard-tables-export-1.csv 198
var-dashboard-tables-export-2.csv 13
gene-dashboard-tables-export-1.csv 184
gene-dashboard-tables-export-2.csv 4
liammulh commented 1 year ago

Okay, @cgpreston met with me via Zoom and showed me the issue.

Problem description

If you "Filter By Status", the table results will match the download results. However, if you use the search bar, the table results will not match the download results. If you use the search bar and "Filter By Status", you will only get results for "Filter By Status".

Problem illustration

Filter by status then search

  1. In the "Filter By Status" dropdown, select "Provisional". As of 2023-06-26, you get 12 results.
  2. Type "A>T" in the search bar. This will narrow down the results in the table to the ones containing "A>T" and status "Provisional". This is the correct behavior for the results in the table. However, if you download the CSV, you will notice that the data is the data for the status filter. The downloaded data is incorrect.

Search then filter by status

  1. Type "A>T" in the search bar. As of 2023-06-26, you get two results.
  2. In the "Filter By Status" dropdown, select "Provisional". The results will increase rather than decrease. The table will show all the results for the status filter. This is incorrect behavior for the results in the table. If you download the CSV, you will notice that the data is the data for the status filter. The downloaded data is incorrect.

Using sets to describe the above scenarios

Let the data for "Filter By Status" be $D_S$. Let the data for globalFilter be $D_G$. Let the data shown in the table be $D_T$. Let the data shown in the download be $D_D$.

Nitty-gritty details

In (1), you'll see that data gets filtered based on statusFilters and props.data. I believe statusFilters are the filters you can set with "Filter By Status", and props.data is the data you get back from a fetch in (2). It seems like the code in (1) doesn't take the search bar input into consideration when it filters. I was confused as to how the search bar actually works if the code in (1) isn't taking the search bar input into consideration. Then I saw the definition for the GlobalFilter component in (3). It looks like the GlobalFilter component uses a piece of state called globalFilter, which comes from the hook useGlobalFilter, which is a hook defined by the react-table NPM library. If you use the search bar to narrow the results, the table will be updated because the table is made using the react-table library, see (4). Thus, it seems we have two ways of narrowing the results in the table, and these two ways don't communicate with each other.

Crucially, the handleDownload (5) function operates on filteredData, which isn't necessarily the same data as the data we get from globalFilter.

(1) https://github.com/ClinGen/gci-vci-aws/blob/10049d1af0b5883fa0dd723fcb50b3b17ff5f580/gci-vci-react/src/components/InterpretationsTable.js#L32-L116

(2) https://github.com/ClinGen/gci-vci-aws/blob/10049d1af0b5883fa0dd723fcb50b3b17ff5f580/gci-vci-react/src/pages/dashboard.js#L46-L68

(3) https://github.com/ClinGen/gci-vci-aws/blob/10049d1af0b5883fa0dd723fcb50b3b17ff5f580/gci-vci-react/src/components/common/TableComponents.js#L7-L29

(4) https://github.com/ClinGen/gci-vci-aws/blob/10049d1af0b5883fa0dd723fcb50b3b17ff5f580/gci-vci-react/src/components/InterpretationsTable.js#L221-L260

(5) https://github.com/ClinGen/gci-vci-aws/blob/10049d1af0b5883fa0dd723fcb50b3b17ff5f580/gci-vci-react/src/components/InterpretationsTable.js#L266-L316

Summary

In the code, it seems like there are two different ways to filter the data. Only one of those ways is taken into consideration when the user clicks the download button.

@cgpreston, I think fixing this would take at least a few days of work for me with my level of familiarity with the code. I've already spent about 2 hours on it. Let me know if you want me to do more.

liammulh commented 1 year ago

@cgpreston and I discussed this today in a meeting. We decided it is not worth the time investment right now because of the VCI v4 upgrade.

For posterity:

I did more investigation on this issue the week before last week. I spun my wheels trying to find an easy way to download the correct data. There are probably a few hacky solutions, but I generally dislike hacky solutions because they increase the complexity of the code base and eventually it becomes difficult to maintain and difficult to add fixes/features.

I think fixing this in a non-hacky way would require a decent chunk of time. Maybe a week. If we decided it was a priority to fix this, I would read up on the library that is used for the table. The library is called "react-table". The version we are using is behind by a major version. If we had plenty of time, I'd say upgrade to the newest version (version 8). If we wanted to stick with the old version, the docs and the examples for the old major version (version 7) are still available. Specifically, I would look at https://github.com/TanStack/table/tree/v7/examples/filtering. There is a CodeSandbox for this code here. In that example, you'll notice they have a search bar filter and a select filter. I would try to refactor our download code so that it takes both the search bar filter and the select filter into consideration rather than taking just the select filter into consideration. I think this would involve moving the download function down in the component hierarchy. (My working hypothesis is that the download function would need to occur after the useFilters hook and useGlobalFilter hook have been used.) Then we would need to manually test this solution for regressions, which would take a while.