making experimental code by alyx-crash-proof

nsteinme commented 6 years ago

Hey @rossant - just had a discussion with @k1o0 and @peterzh about whether we can make our code be immune to alyx crashing or becoming inaccessible (as just happened with zserver, and created big problems). The goal is: when alyx can't be accessed, the "post" commands should be queued (locally) so that when it's accessible again, everything can be added properly. The queueing is already implemented (thanks @petersaj) and works fine, but there is a problem: when we post datasets, we need to specify the session UUID that we want the dataset to go with. If we don't have connection to alyx, we can't get that UUID. In principle the dataset could be posted anyway (session UUID null) but it would be "orphaned" and not search-able.

We propose this solution: alter the rest api endpoint for posting datasets (or make a new endpoint), such that rather than specifying a session UUID, you can instead specify a subject, date, and experimentNumber. (this information is all known when the dataset is ready to be posted, with or without alyx). The server behavior should be:

If a base session for that subject and date already exists, use it; otherwise create a base session for that subject and date.
If a subsession for that subject, date, and expNum already exists, use it; otherwise create the subsession.
Attach the subsession to the base session if not already attached.
Create the dataset, attached to the subsession.

In this way, we can queue something like "post:: subject: MyMouse; date: 2018-02-02; expNum: 2; datasettype: eyeMovie; dataformat: avi; etc", and it will get correctly posted next time the queue is able to be flushed.

Thoughts?

rossant commented 6 years ago

This idea sounds good, I'll see if it is doable with django

rossant commented 6 years ago

Should be done on alyx-dev! Please test and let me know. The extra fields are : subject, date, number.

nsteinme commented 6 years ago

@peterzh can you test this? If it works we want to incorporate it into alyx.registerFile, in case a sessionID is not provided.

peterzh commented 6 years ago

@nsteinme @rossant This works! However when it creates the session on the server, the session's "user" field is empty. Would it be possible to set the user to be the person who submitted the original POST command for the dataset?

peterzh commented 6 years ago

Ah one thing I just noticed. The auto-generated sessions should have the "Type" field as follows: Base sessions should have the type="Base", and sub-sessions should have type="Experiment" Also the sub-sessions should have the base session as its parent, in the field "parent_session". Perhaps it would also be worth auto-populating the field "narrative" with "auto-generated session" as we have it for sessions made from MC.

rossant commented 6 years ago

OK done!

peterzh commented 6 years ago

alyx.registerFile has been modified now to incorporate these changes. I think we also have one other issue related to this: registering a file involves two submission steps. First, submit a dataset. Second, submit a filerecord, pointing to the first dataset. If alyx is down, then the filerecord could not be posted because it doesn't yet know the dataset's URL. Therefore should we create a similar process for this? @nsteinme @rossant

nsteinme commented 6 years ago

Good point, dang. This one might be trickier, e.g. what if you have one camera pointing at the left eye and one pointing at the right eye? There will be two datasets with type "eye.raw" and that subject/date/number. Perhaps for filerecord, it can provide subject/date/number/datasettype, and the server will just use the most recently created, in the case that more than one match those criteria? What do you think?

On Tue, Feb 13, 2018 at 5:20 PM, Peter Zatka-Haas notifications@github.com wrote:

alyx.registerFile has been modified now to incorporate these changes. I think we also have one other issue related to this: registering a file involves two submission steps. First, submit a dataset. Second, submit a filerecord, pointing to the first dataset. If alyx is down, then the filerecord could not be posted because it doesn't yet know the dataset's URL. Therefore should we create a similar process for this? @nsteinme https://github.com/nsteinme @rossant https://github.com/rossant

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cortex-lab/alyx/issues/408#issuecomment-365338625, or mute the thread https://github.com/notifications/unsubscribe-auth/AHPUP3gsEXmc8WcbB2_4FhweWXd3zKXcks5tUcRygaJpZM4R3Sbu .

peterzh commented 6 years ago

That could work. Alternatively we could add a field to the filerecord table, "creation_date" (i.e. the datetime the file was created on disk). This field already exists in the dataset (so usually it would be redundant). But in this case the correct matching between filerecord and dataset would be made by comparing the file creation dates in each. But perhaps this a clunky solution

nsteinme commented 6 years ago

For the dataset, is that field supposed to indicate the start time of the data acquisition, though? @rossant, what do you think about this problem? Does it make sense to you, any ideas?

On Tue, Feb 13, 2018 at 5:31 PM, Peter Zatka-Haas notifications@github.com wrote:

That could work. Alternatively we could add a field to the filerecord table, "creation_date" (i.e. the datetime the file was created on disk). This field already exists in the dataset (so usually it would be redundant). But in this case the correct matching between filerecord and dataset would be made by comparing the file creation dates in each. But perhaps this a clunky solution

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cortex-lab/alyx/issues/408#issuecomment-365341792, or mute the thread https://github.com/notifications/unsubscribe-auth/AHPUPyLIKiUjNIjBNcekXz5yLpSlc4WFks5tUcbVgaJpZM4R3Sbu .

peterzh commented 6 years ago

The creation_date field in dataset is just the date-time of the file creation on the server. That was its intention as far as I'm aware. Information relating to data acquisition timing should probably be within the files themselves, not as meta data on alyx.

rossant commented 6 years ago

I think this problem would be solved if we implement what is currently discussed on the mailing list, with a potential new POST-only endpoint that is specifically designed to create datasets, associated file records, and perhaps also to initiate file transfers.

nsteinme commented 6 years ago

Yes, I was thinking the same thing. Is the intended behavior clear to you or should we write it out? I think the basic concept is: does what you would have done with two separate posts to /datasets and to /files, and also can optionally accept this new specification of subject/date/experimentNumber and deal with sessions appropriately from that.

On Wed, Feb 14, 2018 at 3:25 PM, Cyrille Rossant notifications@github.com wrote:

I think this problem would be solved if we implement what is currently discussed on the mailing list, with a potential new POST-only endpoint that is specifically designed to create datasets, associated file records, and perhaps also to initiate file transfers.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cortex-lab/alyx/issues/408#issuecomment-365641280, or mute the thread https://github.com/notifications/unsubscribe-auth/AHPUP_kWNosuZPRBoxEi7tOLb4P3ZD8kks5tUvrrgaJpZM4R3Sbu .

rossant commented 6 years ago

Yes. Do you think this endpoint should also initiate the file transfers for the missing files? To be clear, POSTing to /data-transfer with the appropriate information could do this:

Create the dataset, linking to the session according to the session id if provided, or subject/date/experimentNumber
For each project associated to the session
- For each data repository associated to the project
  - Create a file record for that data repository, with exists=False
For each file record associated to the dataset
- Use globus (the ID is specified in the data repository model) to find out whether the file actually exists on the server
- If so, update the exists=True field
For each file record where exists=False
- Initiate a globus file transfer from a globus endpoint where the file exists to the globus endpoint associated to that missing file record

nsteinme commented 6 years ago

I don't know anything about the file transfers - that'll be for you to decide with Kenneth...

On Wed, Feb 14, 2018 at 3:53 PM, Cyrille Rossant notifications@github.com wrote:

Yes. Do you think this endpoint should also initiate the file transfers for the missing files? To be clear, POSTing to /data-transfer with the appropriate information could do this:

Create the dataset, linking to the session according to the session id if provided, or subject/date/experimentNumber

For each project associated to the session

For each data repository associated to the project

Create a file record for that data repository, with exists=False

For each file record associated to the dataset

Use globus (the ID is specified in the data repository model) to find out whether the file actually exists on the server

If so, update the exists=True field

For each file record where exists=False

Initiate a globus file transfer from a globus endpoint where the file exists to the globus endpoint associated to that missing file record

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cortex-lab/alyx/issues/408#issuecomment-365650879, or mute the thread https://github.com/notifications/unsubscribe-auth/AHPUP8CqaVvEvzLhjkavcZusmv22Y3V_ks5tUwGPgaJpZM4R3Sbu .

cortex-lab / alyx

making experimental code by alyx-crash-proof #408