AlexsLemonade / refinebio

Refine.bio harmonizes petabytes of publicly available biological data into ready-to-use datasets for cancer researchers and AI/ML scientists.
https://www.refine.bio/
Other
129 stars 19 forks source link

Tracking Data Requests #3361

Closed davidsmejia closed 12 months ago

davidsmejia commented 1 year ago

Context

We are going to process some experiments as a request soon. We will want to be able to generate a report based on how the processing went.

Problem or idea

The initial step for tracking this should be create a management command. This command should accept a single argument of a S3 url to a file that contains the requested experiment accessions with one accession per line. We will use this file to fetch all of the required values of interest listed below.

Values of interest:

Solution or next step

jaclyn-taroni commented 1 year ago

I agree with most of these, @davidsmejia. Why include total run time? For calculating cost?

davidsmejia commented 1 year ago

@jaclyn-taroni yeah I think for now this could be helpful for understanding how long it took to process the request. This would be the difference between first created job / survey and the last finished job. Not for costs as much as trying to get a feel for how much engineering time vs processing time.

arkid15r commented 12 months ago

Implemented in https://github.com/AlexsLemonade/refinebio/pull/3387