ActiveBrainAtlas2 / activebrainatlasadmin

This is the ActiveBrainAtlas database portal. This project provides the admin area to edit data associated with the Active Brain Atlas project. It also provides the REST API.
0 stars 1 forks source link

Restore annotations (testing) #25

Closed drinehart1 closed 1 year ago

drinehart1 commented 2 years ago

Do newly created volumes, restore correctly from archive?

drinehart1 commented 1 year ago

I believe data is stored in json table (prior to parsing). Restore for full table works but not individual annotation tables - separate, stand-alone restores

eddyod commented 1 year ago

Please confirm that this is the correct restore annotations process:

"""Restore a set of annotations associated with a session.
1. Find existing session that is inactive and has been archived
2. Move that archived data (from annotations_point_archive) to either marked_cell, polygon_sequence or structureCOM
3. Deleted that archived data
4. Update that session to be active

Questions:

  1. There is another table archive_set that is not being used. Was there a purpose to this?
  2. If there is data in the marked_cells, polygon_sequences or structure_com table, why is that same session data also in the annotations_points_archive table? Shouldn't it be one or the other?
drinehart1 commented 1 year ago

I don't think we need to delete the archived data (step #4). The historical information sets should be immutable. If you meant clean out current working set prior to restoring, that may be workable however that, too, should be archived prior to any restore function.

1) archive_set table was created to store snapshots of archive points, polygons, cells. This may be removed if not currently used but we need to be clear how we store archived sets. It may be easier (if we don't need to provide summary statistics on data) to simply store snapshots in JSON format in NoSQL DB (document store) and pull out archived dataset if we want to restore a specific set.

2) This apparent discrepancy is due to the current dataset snapshot (which I believe should be stored in these tables: marked_cells, polygon_sequences or structure_com) and archived sets, which need to be stored somewhere. That 'somewhere' could be in JSON format in NoSQL DB or relational table (annotations_points_archive), but must be stored separately from the active dataset. The decision to store in relational table format (e.g. marked_cells_archive) as mirror of current, active set hinges on what summary/descriptive statistics we need to calculate with prior data snapshots (or other operations). If we archive solely for the purposes of not losing data, and providing ability to restore - NoSQL document storage would be fine. If we do active querying on 'old' data (time series, productivity - progress, etc.) we may be forced to store in relational format. This will eventually result in bloated, slow RDBMS, whereas NoSQL solution will not.

madwilliam commented 1 year ago

Hey Ed, process 1,2 and 5 are correct but we don't delete points. Instead we move it into the archive. The archive set I believe is where we store the archived sessions and the annotations_archive table is where we store old annotations. None of the annotations are duplicate.

I am a bit fuzzy on the archive_set, but I think it is being used.

eddyod commented 1 year ago

Duane, I just want to get this process working. We can think of using NOSQL later. William, I updated those numbers above as I had two '2's. But when I say delete, I'm talking about moving data from the annotations_point_archive to one of the 3 original tables. That same data should not exist in both tables. I've now got the restore process working well and I will now focus on looking at the original insertion (Save) process. But here is what I think happens, feel free to correct me:

  1. User clicks 'Save annotations' in Neuroglancer.
  2. If there is existing data with the same session, it gets moved to the archive
  3. The new data gets put in one of the 3 original tables.

What confuses me now is when does a annotation session become inactive?

eddyod commented 1 year ago

I've redone the archiving and restoring feature in annotations, here is link to the SQL changes: https://github.com/ActiveBrainAtlas2/activebrainatlasadmin/blob/master/sql/2022-12-01.updates.sql

here is a synopsis of the process:

A. Saving annotations - when the user clicks 'Save annotations' in Neuroglancer:

  1. All data from the active layer gets sent to one of the three tables:

    • Marked cells
    • Polygon sequences
    • Structure COM
  2. Data also gets sent to the annotations_point_archive table. This table has a unique constraint. When the same data gets sent to the database, it updates it instead of creating new useless inserts. This is done by Django's built-in bulk_create method with the 'ignore_conficts' flag set to true. It also finds an existing archive, or creates a new archive and uses that key for the FK_archive_set_ID (yes, the archive_set table is being used). The constraint is on these columns:

    • Session ID (FK_session_id)
    • x Decimal(8,2) - formerly a float
    • y Decimal(8,2) - formerly a float
    • z Decimal(8,2) - formerly a float

B. Restoring annotations

  1. This occurs when the user checks one and only one checkbox on the archive page. After selecting a checkbox, the user chooses the 'Restore the selected archive' option from the dropdown menu. Once the user clicks 'Go', these events take place:

    a. Get requested archive (set of points in the annotations_points_archive table) b. Mark session inactive that is in the archive c. Create a new active session and add it to either marked_cell, polygon_sequence or structureCOM

eddyod commented 1 year ago

This has been implemented on the server. Close at your leisure.