Closed ebrahimebrahim closed 2 years ago
I think this design can be simplified a bit. Since analysis is treated as one unit (unlike preprocessing), I think we could probably do with just one model, AnalysisResult
.
So AnalysisResult
would contain the following:
status
- Run status (pending/running/finished/failed)error_message
- The error message if it failed.preprocessing_batch
- The preprocessing batch is was run from. Linking it to this batch instead of the dataset will allow for running preprocessing with varying options, and propagating that to the analysis.zip
- Link to the zipped up resultsatlas
- The atlas useddata
- This is where the result images would liveI was thinking of placing all of the resulting images in the data
field, in a JSON/dictionary structure. It would look like the following:
{
"<variable> (e.g. Age)": {
"allocation": {
"correlation": "...",
"pvalue": "..."
},
"transport": {
"correlation": "...",
"pvalue": "..."
},
"vbm": {
"correlation": "...",
"pvalue": "..."
}
},
}
This is the structure that made the most sense to me, since the image we display in the UI is the intersection of selected variable and selected analysis. This structure allows for easily indexing by these two values, and retrieval of both correlation and p-value images.
@ebrahimebrahim What do you think?
@AlmightyYakob This makes a lot of sense. I didn't realize you can stick a json/dictionary structure into one of these django data model things. This organization looks great!
It seems some revision of the preprocessed image models is needed. In short, the following will take place as a part of addressing this issue:
atlas
field on each preprocessed image will be migrated into the associated preprocessing batch, and removed from the images themselves.AnalysisResult
model, the atlas
isn't actually needed, as it's already stored on the preprocessing batch. We can just point to (likely join) that whenever needed.
AnalysisBatch
just like we have the concept ofPreprocessingBatch
, to allow running analysis more than once with different settings, and to allow us to handle errors that arise during analysis runs, and provide the notion of an analysis batch ID so that we know which run it was that generated various artifacts later on.AnalysisResult
: represents the results of an AnalysisBatch that ran successfully. Contains:CorrelationResult
s. there will be one of these for each variable (e.g. age, CDR, etc.)CorrelationResult
: Has the correlation analysis information for a specific variable. Contains:AnalysisResult
@AlmightyYakob What do you think of an organization like this? Let's iterate on this from here. Maybe
AnalysisResult
is redundant and could be merged intoAnalysisBatch
?