Tracking Data Requests - Githubissues

davidsmejia commented 1 year ago

Context

We are going to process some experiments as a request soon. We will want to be able to generate a report based on how the processing went.

Problem or idea

The initial step for tracking this should be create a management command. This command should accept a single argument of a S3 url to a file that contains the requested experiment accessions with one accession per line. We will use this file to fetch all of the required values of interest listed below.

Values of interest:

Experiments attempted, total and breakdown by source
Samples attempted, total and breakdown by source
Experiments available, total and breakdown by source
Samples available, total and breakdown by source
Total number of jobs created
Total run time

Solution or next step

Determine the appropriate bucket for this list to be kept
Implement the management command

jaclyn-taroni commented 1 year ago

I agree with most of these, @davidsmejia. Why include total run time? For calculating cost?

davidsmejia commented 1 year ago

@jaclyn-taroni yeah I think for now this could be helpful for understanding how long it took to process the request. This would be the difference between first created job / survey and the last finished job. Not for costs as much as trying to get a feel for how much engineering time vs processing time.

arkid15r commented 12 months ago

Implemented in https://github.com/AlexsLemonade/refinebio/pull/3387

AlexsLemonade / refinebio

Tracking Data Requests #3361

Context

Problem or idea

Solution or next step