Open adknaupp opened 6 months ago
I'm no longer planning on implementing the functionality above, i.e. polling for jobs to stage. Instead, jobs will be identified at the handling project requests. This makes it much easier to identify which jobs are part of a project. From the users perspective, the only thing that will change is that they will have to either wait until all analyses are complete before adding a dataset to the project, or they will have to "save" a existing project to trigger a request that will cause any new jobs to be identified.
Job files should be staged in their own subfolder of the folder of the associated dataset. This means that although some files generated by a job associated with a parent dataset may relate only to a given child dataset, the files will be found within the parent dataset's folder, not that of the child dataset.
Method
Identify analyses associated with shared datasets
1. Every 5-10 seconds, get all
RUNNING
jobs.SmrtLinkClient.get_analysis_jobs_by_state()
to get (probably) only those jobs whose state isRUNNING
. This should get all possible jobs of interest based on the assumption that any job the app would be interested in should have had to have run for at least 5-10 seconds.2. Ignore all but the "new" jobs
Each time the active jobs are GET'ed from the database, most will have already been handled. To identify the new jobs, there must be some way of determining the jobs that are currently being polled. The 'new' jobs are all the jobs not being polled.
How to keep track of which jobs are being polled
???
3. Further, ignore any job not associated with a shared dataset
A new table needs to be created in the
peewee
database and the project table should be modified to remove the datasets column. Instead, the new table will keep track of shared datasets and which project they are associated with. One column of the new table should store a dataset uuid and the other a project id.4. Start polling each remaining job until it changes state.
Use
SmrtLinkClient.poll_for_successful_job()
to poll until the job changes state. Once the function returns, check whether the final state wasSUCCESSFUL
.