LorenFrankLab / spyglass

Neuroscience data analysis framework for reproducible research built by Loren Frank Lab at UCSF
https://lorenfranklab.github.io/spyglass/
MIT License
94 stars 42 forks source link

Error when kachery sharing old analysis nwb files #914

Closed samuelbray32 closed 7 months ago

samuelbray32 commented 7 months ago

Describe the bug

Solution? @CBroz1, do you know if the datajoint filepath for an entry can be determined without raising a file not found error? If so we can use this to define where the files should be saved to when downloading from kachery to ensure consistency with the source database.

To Reproduce On a remote clien connected to the franklab databaset:

from spyglass.linearization.v0 import IntervalLinearizedPosition
from spyglass.common import PositionIntervalMap
key = {"nwb_file_name": nwb_file_name, "interval_list_name": interval_list_name}
pos_interval = (PositionIntervalMap & key).fetch1("position_interval_name")
lin_pos_key = {"nwb_file_name": nwb_file_name,
           "interval_list_name": pos_interval,
           "position_info_param_name":"default_decoding"
           }
(IntervalLinearizedPosition & lin_pos_key).fetch_nwb()

Error Stack

{
    "name": "FileNotFoundError",
    "message": "[Errno 2] No such file or directory: '/Users/samuelbray/Documents/analysis/j1620210710_FRL083NP3E.nwb'",
    "stack": "---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
/Users/samuelbray/Documents/frank_lab/code/Uri_20240220.ipynb Cell 15 line 1
     <a href='vscode-notebook-cell:/Users/samuelbray/Documents/frank_lab/code/Uri_20240220.ipynb#X22sZmlsZQ%3D%3D?line=9'>10</a> pos_interval = (PositionIntervalMap & key).fetch1(\"position_interval_name\")
     <a href='vscode-notebook-cell:/Users/samuelbray/Documents/frank_lab/code/Uri_20240220.ipynb#X22sZmlsZQ%3D%3D?line=10'>11</a> lin_pos_key = {\"nwb_file_name\": nwb_file_name,
     <a href='vscode-notebook-cell:/Users/samuelbray/Documents/frank_lab/code/Uri_20240220.ipynb#X22sZmlsZQ%3D%3D?line=11'>12</a>            \"interval_list_name\": pos_interval,
     <a href='vscode-notebook-cell:/Users/samuelbray/Documents/frank_lab/code/Uri_20240220.ipynb#X22sZmlsZQ%3D%3D?line=12'>13</a>            \"position_info_param_name\":\"default_decoding\"
     <a href='vscode-notebook-cell:/Users/samuelbray/Documents/frank_lab/code/Uri_20240220.ipynb#X22sZmlsZQ%3D%3D?line=13'>14</a>            }
---> <a href='vscode-notebook-cell:/Users/samuelbray/Documents/frank_lab/code/Uri_20240220.ipynb#X22sZmlsZQ%3D%3D?line=15'>16</a> (IntervalLinearizedPosition & lin_pos_key).fetch_nwb()

File ~/Documents/frank_lab/code/spyglass/src/spyglass/utils/dj_mixin.py:128, in SpyglassMixin.fetch_nwb(self, *attrs, **kwargs)
    120 def fetch_nwb(self, *attrs, **kwargs):
    121     \"\"\"Fetch NWBFile object from relevant table.
    122 
    123     Implementing class must have a foreign key reference to Nwbfile or
   (...)
    126     precedence.
    127     \"\"\"
--> 128     return fetch_nwb(self, self._nwb_table_tuple, *attrs, **kwargs)

File ~/Documents/frank_lab/code/spyglass/src/spyglass/utils/dj_helper_fn.py:189, in fetch_nwb(query_expression, nwb_master, *attrs, **kwargs)
    185     if not os.path.exists(file_path):
    186         # retrieve the file from kachery. This also opens the file and stores the file object
    187         get_nwb_file(file_path)
--> 189 rec_dicts = (
    190     query_expression * tbl.proj(nwb2load_filepath=attr_name)
    191 ).fetch(*attrs, \"nwb2load_filepath\", **kwargs)
    193 if not rec_dicts or not np.any(
    194     [\"object_id\" in key for key in rec_dicts[0]]
    195 ):
    196     return rec_dicts

File ~/miniforge3/envs/spyglass/lib/python3.9/site-packages/datajoint/fetch.py:231, in Fetch.__call__(self, offset, limit, order_by, format, as_dict, squeeze, download_path, *attrs)
    229 attributes = [a for a in attrs if not is_key(a)]
    230 ret = self._expression.proj(*attributes)
--> 231 ret = ret.fetch(
    232     offset=offset,
    233     limit=limit,
    234     order_by=order_by,
    235     as_dict=False,
    236     squeeze=squeeze,
    237     download_path=download_path,
    238     format=\"array\",
    239 )
    240 if attrs_as_dict:
    241     ret = [
    242         {k: v for k, v in zip(ret.dtype.names, x) if k in attrs}
    243         for x in ret
    244     ]

File ~/miniforge3/envs/spyglass/lib/python3.9/site-packages/datajoint/fetch.py:291, in Fetch.__call__(self, offset, limit, order_by, format, as_dict, squeeze, download_path, *attrs)
    288     raise e
    289 for name in heading:
    290     # unpack blobs and externals
--> 291     ret[name] = list(map(partial(get, heading[name]), ret[name]))
    292 if format == \"frame\":
    293     ret = pandas.DataFrame(ret).set_index(heading.primary_key)

File ~/miniforge3/envs/spyglass/lib/python3.9/site-packages/datajoint/fetch.py:64, in _get(connection, attr, data, squeeze, download_path)
     61 adapt = attr.adapter.get if attr.adapter else lambda x: x
     63 if attr.is_filepath:
---> 64     return adapt(extern.download_filepath(uuid.UUID(bytes=data))[0])
     65 if attr.is_attachment:
     66     # Steps:
     67     # 1. get the attachment filename
     68     # 2. check if the file already exists at download_path, verify checksum
     69     # 3. if exists and checksum passes then return the local filepath
     70     # 4. Otherwise, download the remote file and return the new filepath
     71     _uuid = uuid.UUID(bytes=data) if attr.is_external else None

File ~/miniforge3/envs/spyglass/lib/python3.9/site-packages/datajoint/external.py:330, in ExternalTable.download_filepath(self, filepath_hash)
    324 file_exists = Path(local_filepath).is_file() and (
    325     not _need_checksum(local_filepath, size)
    326     or uuid_from_file(local_filepath) == contents_hash
    327 )
    329 if not file_exists:
--> 330     self._download_file(external_path, local_filepath)
    331     if (
    332         _need_checksum(local_filepath, size)
    333         and uuid_from_file(local_filepath) != contents_hash
    334     ):
    335         # this should never happen without outside interference
    336         raise DataJointError(
    337             f\"'{local_filepath}' downloaded but did not pass checksum.\"
    338         )

File ~/miniforge3/envs/spyglass/lib/python3.9/site-packages/datajoint/external.py:128, in ExternalTable._download_file(self, external_path, download_path)
    126     self.s3.fget(external_path, download_path)
    127 elif self.spec[\"protocol\"] == \"file\":
--> 128     safe_copy(external_path, download_path)
    129 else:
    130     assert False

File ~/miniforge3/envs/spyglass/lib/python3.9/site-packages/datajoint/utils.py:115, in safe_copy(src, dest, overwrite)
    113 dest.parent.mkdir(parents=True, exist_ok=True)
    114 temp_file = dest.with_suffix(dest.suffix + \".copying\")
--> 115 shutil.copyfile(str(src), str(temp_file))
    116 temp_file.rename(dest)

File ~/miniforge3/envs/spyglass/lib/python3.9/shutil.py:264, in copyfile(src, dst, follow_symlinks)
    262     os.symlink(os.readlink(src), dst)
    263 else:
--> 264     with open(src, 'rb') as fsrc:
    265         try:
    266             with open(dst, 'wb') as fdst:
    267                 # macOS

FileNotFoundError: [Errno 2] No such file or directory: '/Users/samuelbray/Documents/analysis/j1620210710_FRL083NP3E.nwb'"
}

Additional context Add any other context about the problem here.

Note that the file is downloaded in a subdirectory. After running the above this statement executes

from spyglass.common import AnalysisNwbfile
analysis_file = (IntervalLinearizedPosition & lin_pos_key).fetch1("analysis_file_name")
path = AnalysisNwbfile().get_abs_path(analysis_file)

import os
assert os.path.exists(path)
samuelbray32 commented 7 months ago

One solution is to catch this error and move the downloaded file to the correct location if so. Here's an example:

import os
import shutil

table = IntervalLinearizedPosition & lin_pos_key

try:
    (table).fetch_nwb()
except FileNotFoundError as e:
    # get the location stored as a datajoint filepath
    dj_path = str(e).split(': ')[1].replace("'","")
    print(dj_path)
    # get the location where AnalysisNwbfile.kachery would have stored it
    analysis_file = (table).fetch1('analysis_file_name')
    current_path = AnalysisNwbfile().get_abs_path(analysis_file)
    assert os.path.exists(current_path)
    # move the file to the datajoint location
    # this will change the output of future calls to AnalysisNwbfile().get_abs_path(analysis_file)
    shutil.move(current_path,dj_path)
    table.fetch_nwb()

Once this has executed once, future calls of fetch_nwb for the analysis file will work fine. One option would to put a version of this check in AnalysisNwbfileKachery to solve it in the background when the file is downloaded.

@edeno, do you have a sense if this will show up often enough that we should put this into spyglass or should this issue just be the solution for people on a case-by-case basis?

edeno commented 7 months ago

I think it would be okay to have in AnalysisNwbfileKachery for now. It would be good to have it pretty well documented in the code why the check is happening.

samuelbray32 commented 7 months ago

Came up with a cleaner solution in the PR above. Fixing the abs_path returned by AnalysisNwb.get_abs_path() to agree with datajoint entries. Since this is what's used to define where kachery saves the file to in the first place it fixes the issue before it happens