292 - Fix: Implement the dataset loader for polling the latest dataset state for the dataset page UI

nozomione commented 9 months ago

Issue Number

NOTE: The changes in 5 files were simply the results of fixing hook imports and argument orders, deletion of a file, and state imports.

Closing #292

Purpose/Implementation Notes

Implemented logic for polling the latest state of the processing dataset, and based on that state, render the appropriate view (processing/ready for download/error) on the dataset page.

Created the useResourceLoader hook (refactored useOneoffExperiment hook) for downloading files for both user-created datasets and one-off experiments datasets.

Please see the latest UI here.

For the user-defined datasets (My Dataset): Step 1: Add samples to My Dataset Step 2: Start the dataset download process by clicking the "Download" button in the dataset page Step 3: Once the dataset processing is completed, the application renders the appropriate UI (success/error)

Screenshot 2023-11-20 at 9 17 59 PM

For the one-off experiment datasets: Step 1: Click the "Download Now" button in a search card on the search results page to start the download process Step 2: The application renders the processing budge while the one-off experiment dataset being processed

Screenshot 2023-11-21 at 2 01 37 PM

(search results page: while processing)

(experiment page: while processing)

Types of changes

Breaking change (fix or feature that would cause existing functionality to not work as expected)

Functional tests

Tested the implementation and made sure it works for both My Dataset download and the dataset download for one-off experiment.

Checklist

[x] Lint and unit tests pass locally with my changes
[ ] I have added tests that prove my fix is effective or that my feature works
[ ] I have added necessary documentation (if appropriate)
[ ] Any dependent changes have been merged and published in downstream modules

vercel[bot] commented 9 months ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
refinebio-web	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Jun 13, 2024 10:13pm

nozomione commented 7 months ago

@davidsmejia this PR is ready for another look.

nozomione commented 7 months ago

@davidsmejia I've applied the feedback and this PR is ready for another look!

🗒️ I filed the issue #329 (under the epic #336). It's for a minor refactoring of the fetchAsync helper to update the response it returns for setting our states based on the sever status (error or success), and to update other resource fetches match this new change made in this PR (to update the resource on successful fetch). Please take a look.

nozomione commented 3 months ago

Let's remove the usage of "resource" in variable names and function names associated with this PR.

What term would you suggest in place of resource ?

The dataset and datasetId are taken by DatasetManager, and we're also passing an accession code to this hook, we want to use a name that represents both of those resources. Perhaps what do you think about selectedDataset?

Break up the "getters" since we will know when using this hook if we want to check if an accession has a processing dataset or if we are checking by dataset id explicitly.

Currently the resource id check is done by using the regular expression: https://github.com/AlexsLemonade/refinebio-web/blob/09fdc6e996d56ccf7c41cd47f6150221ca7eb8b0/src/hooks/usePollDatasetStatus.js#L49-L51

Could you elaborate on this one?

Thank you David!

davidsmejia commented 3 months ago

Let's remove the usage of "resource" in variable names and function names associated with this PR.

What term would you suggest in place of resource ?

The dataset and datasetId are taken by DatasetManager, and we're also passing an accession code to this hook, we want to use a name that represents both of those resources. Perhaps what do you think about selectedDataset?

Break up the "getters" since we will know when using this hook if we want to check if an accession has a processing dataset or if we are checking by dataset id explicitly.

Currently the resource id check is done by using the regular expression:

https://github.com/AlexsLemonade/refinebio-web/blob/09fdc6e996d56ccf7c41cd47f6150221ca7eb8b0/src/hooks/usePollDatasetStatus.js#L49-L51

Could you elaborate on this one?

Thank you David!

I think these questions are related so I am going to provide a single response to all of them first.

The useDatasetManager should be responsible for creating, querying, configuring, and describing a dataset. Essentially activity related to a specific dataset.

The usePollDatasetStatus is the collection of logic around polling a specific dataset. This currently, and correctly, uses useDatasetManager under the hood to perform the query.

The change I would like to see here is how we associate a specific accession code with a dataset.

Currently usePollDatasetStatus does the following in this PR:

Initializes the hook with either an accession code or a dataset id.
fetches the latest state on mount
provides helper methods for managing a list of currently polled datasets, either by ID of accession code.
stores an array of objects that contain datasetid and optionally accession code for association

I would like to see the ambiguous argument (dataset id / accession code) removed from the initialization of the hook and from the `add.

We would accomplish this by:

removing the initialization argument from the usePollDatasetStatus hook
remove the useEffect that checks if the ID can be found and refreshes if it is lines 15-19
move processingDatasets (old name processingResources) useDatasetManager
Add a local storage state in RefinebioContext / exposed in hook with value of datasetAccessions that will default to an empty dict. The keys will be accession codes and the values will be dataset ids.
add new method pollDatasetId
add new method pollDatasetAccession
- these functions should just set the datasetid and that is what the useEffect/polling logic should check

nozomione commented 3 months ago

The following updates were completed:

remove the useEffect that checks if the ID can be found and refreshes if it is lines 15-19

move processingDatasets (old name processingResources) useDatasetManager

And now that we've moved processingDatasets to useDatasetManager, I've also:

moved addDatasetToProcessingDatasets(previously addProcessingDataset from usePollDatasetStatus) to useDatasetManager
call addDatasetToProcessingDatasets from startProcessingDataset to handle the addition of a new processing dataset (no longer calling it from two separate locations)

(see commit at be12eb546c9d68058a5c506f981852100c7e0057)

For the following items, I have a few questions:

Q 1: How do we determine which dataset ID to poll inside the hook if we remove the hook's parameter?

removing the initialization argument from the usePollDatasetStatus hook

Reasons for passing an ID as an argument to usePollDatasetStatus hook

- To quickly check the given ID and initiate a polling request on component mount - To keep the polling logic inside the hook as much as possible (including ID check, start/refresh/end polling) ```js usePollDatasetStatus(processingId) // either a dataset ID or an one-off experiment accession code ``` When the following UIs are viewed by the user (on component mount), this hook checks the given ID, starts polling requests, and display the processing UI while it's being processed: - Search card and its download modal in the search results page (passing an accession code to the hook): ![Screenshot 2024-06-03 at 5 14 09 PM](https://github.com/AlexsLemonade/refinebio-web/assets/31800566/db68b1a3-42bc-4cbf-89ea-d9b008afee09) - Experiment page for a specific accession code (passing an accession code to the hook): ![Screenshot 2024-06-03 at 5 14 20 PM](https://github.com/AlexsLemonade/refinebio-web/assets/31800566/280ba0b7-f6b3-4611-9a8e-41b799dd4c94) - Dataset page that renders a specific dataset ID (passing a dataset ID in the URL to the hook): (e.g., `/dataset/695fd15-2f7c-4b5c-8b39-212f9ee7b21e`) ![Screenshot 2024-06-03 at 5 15 06 PM](https://github.com/AlexsLemonade/refinebio-web/assets/31800566/a4290357-db02-438e-8a4b-632fef05c36c)

By removing the argument for the hook (ID to check) and adding the next two methods:

add new method pollDatasetId

add new method pollDatasetAccession

these functions should just set the datasetid and that is what the useEffect/polling logic should check

Q 2: Where would you like the dataset ID to be set and start the polling request?

To put another way, if this hook no longer accepts an ID as an argument, then at first, the ID check must to be done outside of the hook (a component that uses this hook by calling these new methods, useEffect etc). And possibly adding a new state to the hook to store that obtained dataset ID so that we can use it inside the hook to start polling.

I suggest preserving the hook's argument to maintain the quick ID check and the polling logic within the hook as much as possible.

I'd like to hear your input on this. Please let me know, thank you David!

davidsmejia commented 3 months ago

Ok, I think your questions stem from the same point so let me try clarifying this first and if still unclear I will try to tackle the questions separately.

Add a local storage state in RefinebioContext / exposed in hook with value of datasetAccessions that will default to an empty object. The keys will be accession codes and the values will be dataset ids.

So you don't have to care that it is a valid accession code here. When you add the dataset with optional accession code, you populate a separate object that functions like a look up table. No need for the regex.

So update what you have in useDatasetManager starting at line 34 Here you only need to assign a flat value of processing datasets and an object to look up by accession code. So later in the application you will always know one of two things when you need to check if a dataset is processing:

The dataset id, this is obvious
The accession code which can be looked up to get the dataset id

useDatasetManager.js

 const addToProcessingDatasets = (id, accessionCode) => {
    if (accessionCode) {
      setDatasetAccessions({...datasetAccessions, [accessionCode]: id})
    }
    setProcessingDatasets((prev) => {
      if (prev.includes(id)) return prev
      return [...prev, id]
    })
  }

Then you want to poll for a specific dataset. This should be done by calling one of the two functions below: usePollDatasetStatus.js


// this is the state variable that will trigger a rerender when updated
const [polledDatasetId, setPolledDatasetId] = useState(null)
const [datasetState, setDatasetState] = useState(null)
const timerRef = useRef(null)

// check if the hook should poll a dataset
  useEffect(() => {
    if (polledDatasetId) {
      timerRef.current = setInterval(() => {
        refreshProcessingDataset()
      }, 1000 * 60)
    }
    return () => {
      clearInterval(timerRef.current)
    }
  }, [polledDatasetId])

// by dataset ID 
pollDatasetId(datasetID) {
  // the dataset doesnt need to be polled anymore because it has been removed
  if (processingDatasets.includes(datasetID)) {
    setPolledDatasetId(datasetID)
  }
}

// tell the hook to watch a specific accession
/// call the above method if there is a match
pollDatasetAccession(accession) {
  if (datasetAccessions.hasOwnProperty(accession)) {
    pollDatasetId(datasetAccessions[accession])
  }
}

Let me know if you can see how this would work. Each hook would only be responsible for tracking the state of a single dataset. We should have a separate cleanup function ie removeProcessingDataset that only takes dataset id and removes it from the list of processing datasets and the datasetAccessions object.

nozomione commented 3 months ago

The following updates were completed:

removing the initialization argument from the usePollDatasetStatus hook

Now the hook no longer takes an ID argument and instead uses a new state, polledDatasetId, to set the processing dataset ID.

(see commit at 0d8ac185cfeb6566a552ba97471d3b7dd23c8ac9)

Add a local storage state in RefinebioContext / exposed in hook with value of datasetAccessions that will default to an > empty dict. The keys will be accession codes and the values will be dataset ids.

A lookup table datasetAccessions local storage state has been added to RefinebioContext to store the accession code of processing one-off experiment and imported into useDatasetManager used by usePollDatasetState.

(see commit at b204a5da3c5de25e55ddaa8f0cccc44ca1aa95c3)

add new method pollDatasetId

add new method pollDatasetAccession

these functions should just set the datasetid and that is what the useEffect/polling logic should check

The new methods have been added and are called from useEffect of components that uses the usePollDatasetState hook to set polledDatasetId for polling.

(see previous commit and at 78d5819f65803bc752dea8ef826de5936b8f61e9)

a separate cleanup function ie removeProcessingDataset that only takes dataset id and removes it from the list of processing datasets and the datasetAccessions object.

The clean up method has been added which removes polledDatasetId from:

the processingDatasets array
the datasetAccessions object

Please note that I've also included a line that resets the polledDatasetId to null to stop the running polling timer (initially added it to the cleanup block of refreshProcessingDataset, but moved it to this method).

This PR is ready for your review, thank you David!

nozomione commented 3 months ago

I've updated the PR based on your feedback and ready for your review, thank you David!

nozomione commented 3 months ago

I've updated the PR based on your suggestions and ready for another look, thank you David!

(Note: the input to clear interval https://github.com/AlexsLemonade/refinebio-web/pull/294#discussion_r1628169852 was applied at 20607db98fa01e5661ccbbc8704d7c1f132774c5.)

AlexsLemonade / refinebio-web