cortex-lab / alyx

Database for experimental neuroscience laboratories
44 stars 11 forks source link

add dataset existing in a NAS via url endpoint #684

Open gmaggi opened 3 years ago

gmaggi commented 3 years ago

Hi,

I am starting to get familiar with Alyx (and ONE) to promote its usage in my institution (a neuroscience lab). I managed to install it in a server and enable the apache service, all is fine so far. What I am trying to do is the following:

  1. At Alyx, In Data->DataSet->FileRecords, I aim to add a dataset that exists in a NAS (FreeNAS) by using the URL endpoint. I've been able to do so, but the option "on server" is not marked (see attachment please)
  2. The idea is to "link" this dataset to a session (now the session is labeled as None, as in the attachment)
  3. The final goal is to retrieve (download) the dataset in a cliente/workstation computer by using ONE-libs (python and matlab). I've seen some ONE tutorials how to print out the metadata, but I am missing how to actual download the dataset in the server.

Could you give me some guidance with this please?

Thanks, Giuliano.

Screenshot 2021-05-24 at 21 10 13
kdharris101 commented 3 years ago

Hi Giuliano,

Great to hear you are interested in Alyx! Not certain I fully understood the question but here is my best shot.

  1. The "on server" server field has a specific role for when you have multiple individual labs that want to merge their data on a single central server. This is what happens in IBL but I imagine not in your case. So I think it is fine for the field to remain unset.

  2. It should be possible to link the dataset to a session even without the "on server" field set. Has this been a problem?

  3. Does the client have access to the NAS? In this case it should be straightforward. If the client does not have access to the NAS, you need to set up some other means of getting the data to the client (eg http server), which is more complicated.

We are currently updating the ONE libraries, and this would be a great test case! So please stay in touch and we will make sure it works.

gmaggi commented 3 years ago

Hi Kenneth,

Thanks for the explanation!

  1. merging file into a central storage server sounds interesting. We later would be interested on that feature as users have data-files in several work stations.
  2. I've not managed to link a dataset to a session, they are marked as "None". I think here I am missing something, I explain below how I do it so far.
  3. The clients/users have access to the NAS via NFS and SMB. They can also visualize and download the data via WebDAV. The latter actually allows to get the data via URL, which is the one that I pass to "DATA FILES -> Data repositories"

What we pursue in a first stage is:

  1. users add data (like recordings) to the NAS via NFS/SMB (as currently done). This dataset is afterward manually inserted in Alyx as "Data repository" by providing the WebDAV URL link (in the future we might use Globus). Subsequently, users can create the session with the correspond metadata
  2. users can afterwards visualize metadata associated to the session and also be able to download the dataset, all via python/matlab methods.

How I do so far:

  1. Go to "DATA FILES -> Data repositories" then give the WebDAV URL and other corresponding data. Then I create a Dataset in "DATA FILES -> Datasets". And in "FILE RECORDS" I include the data-repository previously set.
  2. Secondly, I create a subject in "COMMON -> Subjects", then I create a session associated to this subject. However, I can no add dataset information, see attachment. do I miss something here?. It seems is does not give the option to select a dataset.

Thanks, Giuliano. Screen Shot 2021-05-26 at 4 17 33 PM

kdharris101 commented 3 years ago

OK I see! I think these things will all be much easier using the REST endpoints - you can add files programatically straight from the recording software and it will take care of all of this for you. I'm not the best person to explain though, @k1o0 could you take this?

k1o0 commented 3 years ago

Hi Giuliano,

As Kenneth mentioned, adding datasets is much more easily done through REST. We have endpoints for creating sessions and registering files, but first I'll go over the workflow for linking everything together. The first four steps are done through the web browser and only need to be done once per subject/dataset type etc.

  1. We first manually create a new subject through the admin interface. You can also optionally add a project to associate your subject with.
  2. We also create a data repository for each computer that will host the files. If the data repository has a non-empty data url field, then files registered to that server will have a green tick under the 'on server' column.
  3. We create a new dataset type for each data type that we'll be registering. A dataset type is like a dataset but it includes wildcards in the name so that you can search over datasets with the same content but different formats, etc. For example you could create a new dataset type called 'raw log' with the filename pattern *log.raw* When you register a file such as _rig1_log.raw.txt or log.raw.rtf it will automatically be part of the 'raw log' dataset type. The main purpose of this is to use the dataset type description field to document what the files are and how to work with them. When registering files they must match exactly 1 dataset type.
  4. It's also necessary to manually add the file formats that your files will use. Many common file formats are already there by default. You can find them at /admin/data/dataformat.
  5. We then create the session for a given subject using the /sessions REST endpoint. Documentation can be found here but you basically make an http POST request with the subject name, date and number. You can also create the session though the admin interface.
  6. We then register the actual files to Alyx. For this the files need to be organized in a specific folder structure: they should be in a subject/date/number folder, for example mouse1/2021-05-27/001/log.raw.rtf. Using the /register-file endpoint, the files will be associated to the correct session based on this folder structure.

You can make REST queries through the MATLAB API and Python. The important thing is to register the files after you've created the subject and session, otherwise they may not be linked to the correct meta data.

I hope that helps!

gmaggi commented 3 years ago

Hi @k1o0 and @kdharris101 , thank you the feedback!

I've followed the steps indicated above by @k1o0 . However, I cannot manage to set "Data->Datasets" as "on server", actually, when I select the option "add dataset", the "on server" option is always disabled. See attachment. I wonder if I am missing something in the settings, the only thing I've changed so far to make it work with our resources is the ALLOWED_HOSTS in alyx/alyx/settings_lab.py:

from textwrap import dedent

# ALYX-SPECIFIC
ALLOWED_HOSTS = ['nerfsql01vm']
LANGUAGE_CODE = 'en-us'
TIME_ZONE = 'GB'
GLOBUS_CLIENT_ID = '525cc517-8ccb-4d11-8036-af332da5eafd'
SUBJECT_REQUEST_EMAIL_FROM = 'alyx-noreply@cortexlab.net'
DEFAULT_SOURCE = 'Cruciform BSU'
DEFAULT_PROTOCOL = '1'
SUPERUSERS = ('root',)
STOCK_MANAGERS = ('charu',)
WEIGHT_THRESHOLD = 0.75
DEFAULT_LAB_NAME = 'cortexlab'
DEFAULT_LAB_PK = '4027da48-7be3-43ec-a222-f75dffe36872'
SESSION_REPO_URL = \
    "http://ibl.flatironinstitute.org/{lab}/Subjects/{subject}/{date}/{number:03d}/"
NARRATIVE_TEMPLATES = {
    'Headplate implant': dedent('''
    == General ==
...

could you give me an extra hand with this please?

Screen Shot 2021-06-02 at 10 30 41 AM

k1o0 commented 3 years ago

There are two models: datasets and dataset types. In step 3 above it's the dataset types that we add through the admin interface. The datasets themselves are registered through REST. The on_server field is calculated, so you can't manually set it. A dataset is shown as online if one of the associated file records has an exist field that is set to true and is on a data repository that has a non null url field.

dervinism commented 2 years ago

Hi k1o0,

I have installed Alyx as a local database and trying to use it to store my own data and possibly extending it even further for the entire lab. I've ran into the same issue as described here that I am unable to register my files. I cannot associate a dataset with a session. I follow steps 1 to 5 as you outlined. However, I cannot pass through step 6. Matlab REST API appears to be nonfunctioning and I raised an issue here https://github.com/cortex-lab/alyx-matlab/issues/80 Now I am trying to use the python interface but am not able to create an AlyxClient object. Executing code

from one.webclient import AlyxClient
alyx = AlyxClient(base_url='http://localhost:8000/admin', username='admin', password='admin', cache_dir=None, silent=False, cache_rest='GET')

Gives the following error:

$ /home/user/anaconda3/envs/iblenv/bin/python alyxRegisterFiles.py
Traceback (most recent call last):
  File "alyxRegisterFiles.py", line 3, in <module>
    alyx = AlyxClient(base_url='http://localhost:8000/admin', username='admin', password='admin', cache_dir=None, silent=False, cache_rest='GET')
  File "/home/user/anaconda3/envs/iblenv/lib/python3.10/site-packages/one/webclient.py", line 502, in __init__
    self.authenticate(username, password)
  File "/home/user/anaconda3/envs/iblenv/lib/python3.10/site-packages/one/webclient.py", line 630, in authenticate
    rep.raise_for_status()
  File "/home/user/anaconda3/envs/iblenv/lib/python3.10/site-packages/requests/models.py", line 960, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: http://localhost:8000/admin/auth-token

What is a correct way to set up AlyxClient?

Also a related issue. I think having a tutorial on how to upload files onto the Alyx database using the graphical interface would really be useful for new people trying to adopt Alyx for their data management. The existing documentations assume a high level of familiarity with the Alyx database already which, seems to me, would prevent its widespread adoption.

dervinism commented 2 years ago

Registering files worked after correcting base_url to base_url='http://localhost:8000'

k1o0 commented 2 years ago

Hi, yes the 'admin' part of the URL refers to the admin interface, which is for browsing and editing the database in the browser. Just use the root URL for REST requests through ONE and alyx-matlab

k1o0 commented 2 years ago

Also a related issue. I think having a tutorial on how to upload files onto the Alyx database using the graphical interface would really be useful for new people trying to adopt Alyx for their data management. The existing documentations assume a high level of familiarity with the Alyx database already which, seems to me, would prevent its widespread adoption.

Unfortunately we don't have a system for registering files through the admin interface. You can do it but it would be incredibly arduous. It's better to the use register-file endpoint. Endpoint documentation can be found here. You can use the registration module in the ONE-API or the registerFile method of alyx-matlab.

dervinism commented 2 years ago

Thanks for a quick reply. I have also submitted a related issue on https://github.com/int-brain-lab/iblenv/issues/305 I suspect I am doing something wrong but I'm not exactly sure. But again this comes back to the issue of being able to enter all the required info in the Alyx database correctly, which is difficult without a tutorial.