ModelSEED / PlantSEED

Repository for containing code and files that pertain to the use of PlantSEED in the ModelSEED environment
6 stars 7 forks source link

ToDo List for PlantSEED Workshop 2016 #17

Closed samseaver closed 7 years ago

samseaver commented 8 years ago

I'm editing this list in place, now, so that there'll not be any surprises when I get back from England:

1) Development Deployment: At time of writing, the development environment is active, and we've successfully deployed a function for uploading fasta files to this environment. The current "working" branch of the development environment is: https://github.com/samseaver/ProbModelSEED/tree/plantseed_workshop_functions

What I've not yet done is merge this with Chris' new KBase_ModelSEED_consolidation branch, which is important to note because the modeling tools of the development environment would therefore not be working as expected until we do so.

2) Workshop Website: Neal finished this: http://modelseed.org/events/events/plantseed2016 though he says the URL will be updated, and Claudia has been informed of the website.

3) Account Creation: We will be using RAST accounts. As far as we understand, all workshop participants have already generated a RAST account.

4) File upload: Neal has completed the front-end capability for loading files into shock, it passes the shock ID and a name to a back-end function that will download the files, either plain text or gzipped, and, in the case of fasta, creates a genome object. The function uses the passed name parameter to create a new modelfolder of the same name, and a new genome object within that folder.

The next step, which should be straight-forward, is to implement the same approach for uploading gene/transcript abundances as a TSV file. The ensuing workspace object would likely be a shock upload node, from which the data would be retrieved when running transcriptomics FBA.

Notes: a) The back-end function needs better error handling, file validation(?), and to return non-null result strings for effective debugging. b) Neal will test and improve the front-end function so it handles problematic scenarios better (such as when the shock server is down). c) The current PlantSEED website does not yet show the list of genomes properly under the "My Genomes" tab, this needs to be fixed so that it can find the genomes listed as //plantseed//genome

5) Object editing We will allow users to edit and save:

Currently, the ability to edit the function of a feature was enabled for last years workshop

6) Object visualization: We will allow users to visualize the gene expression data in a simple table.

7) Update Reference Data: I have updated the organization of the reference data including the ten PlantSEED genomes, so they can now be viewed properly within the PlantSEED website (needs testing).

The annotation of the genomes in turn needs to be updated with the annotation from the current work in Protein Families for PlantSEED v2.

The "Copy" function has been disabled on the back-end, there's an issue with the genome reference that is stored for a model and its features, and we need to revise how this is organized within the model.

8) Annotation Pipeline: A back-end function will be created that allows the annotation of new sequence sets using PlantSEED kmers. I have a new set of Kmers to go, and its always easy to update, so my first option is to create an app in the app_service to use a cached set of Kmers to scan the sequences. This would be the easier option than working with Bob, but I can't anticipate any potential problems. The Kmer file currently sits at ~6MB so size isn't a problem, but it has 717K Kmers, so iteration is the issue.

samseaver commented 8 years ago

OK, this comment will review what was done in the previous list, and the following list will be the updated ToDo list.

1) Done, including merger with KBase_ModelSEED_consolidation 2) Done, will send to Claudia soon 3) Covered. 4) Mostly done, needs testing, I would like to send to Claudia today (Thursday July 14th) 7) Partly done 5, 6, and 8) ToDo

samseaver commented 8 years ago

New ToDo list:

1) Restore the development website which went down with the crash on Monday, it's proxy is in a Magellan VM and neither Dan nor the Magellan staff has responded to us. Neal has taken to cloning the website in an AWS instance in order to give us greater control.

2) Genome File upload: It's working, the front end will load a fasta file, either plain or gzipped directly into shock, and it passes the shock ID and a name to a back-end function that will download the file and create a genome object. The function uses the passed name parameter to create a new modelfolder of the same name, and a new genome object within that folder.

Notes: a) The back-end function needs better error handling, especially if the file isn't fasta. b) The front-end function might need better error handling for scenarios such as when the shock server is down. c) The Plant genome viewer depends on a secondary "minimal_genome" object to quicken loading time, the function needs to be updated to create these objects too. d) The genomes' user and/or auto-meta is missing keys or values, so I need to cross-check with Neal what should be present.

3) Gene Expression/Transcript Abundance upload: We will implement the same approach for uploading gene/transcript abundances as a TSV file. I will create the back-end function, which will be very similar to what is happening with the genome. The uploaded object will be saved in the workspace as a shock node, and will only be used for one of two things: visualization as a heat-map (which may be memory-intensive, will have to debate it) and application within transcriptomics FBA, which should work out-of-the box given the KBase ObjectAPI merger.

4) Functional editing. We currently allow editing of media and the functions of features (i.e. updating their annotation if it is wrong or missing). I think we should allow the editing of models, but I'm not giving this high priority for the workshop. I'm not yet sure what users would find useful when it comes to editing models in place.

5) Annotation pipeline This is the one I need to spend most of my time on from here on. There's two aspects of the annotation pipeline, one is the assignment of ModelSEED functional annotation using K-mers, and one is the generation of plant similarities using BLAST.

6) I need to create a SandBox model, likely to go in the PlantSEED front-page, that will be much smaller, can be copied, and is 'broken' so users can gap-fill it, and see the results.

7) Testing (07/25-07/29)

8) Additional functionality: I reviewed the workshop schedule to be sure that I didn't miss anything, there's three things that we said that we'd show, but I need to look at them to see if they'd work the way we'd want them to:

a) Comparison - Comparing genomes/annotations/models? b) Essentiality - Predicting essential genes and reactions? c) Pathways - Identifying pathways responsible for observations? d) Integrating plant-microbial models? e) Editing sandbox model?