NeuromatchAcademy / course-content

NMA Computational Neuroscience course
https://compneuro.neuromatch.io
Creative Commons Attribution 4.0 International
2.65k stars 983 forks source link

Allen hot fix #1116

Closed iamzoltan closed 2 months ago

iamzoltan commented 2 months ago

@steevelaquitaine @yavorska-iryna - its the same issue as before.


projects/neurons/load_Allen_Visual_Behavior_from_SDK.ipynb failed quality control.
An error occurred while executing the following cell:
------------------
data_storage_directory = "/temp"  # Note: this path must exist on your local drive
cache = VisualBehaviorOphysProjectCache.from_s3_cache(cache_dir=data_storage_directory)
------------------

---------------------------------------------------------------------------
PermissionError                           Traceback (most recent call last)
Cell In[3], line 2
      1 data_storage_directory = "/temp"  # Note: this path must exist on your local drive
----> 2 cache = VisualBehaviorOphysProjectCache.from_s3_cache(cache_dir=data_storage_directory)

File /opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/allensdk/brain_observatory/behavior/behavior_project_cache/project_cache_base.py:77, in ProjectCacheBase.from_s3_cache(cls, cache_dir, bucket_name_override)
     50 @classmethod
     51 def from_s3_cache(
     52         cls,
     53         cache_dir: Union[str, Path],
     54         bucket_name_override: Optional[str] = None
     55 ) -> "ProjectCacheBase":
     56     """instantiates this object with a connection to an s3 bucket and/or
     57     a local cache related to that bucket.
     58 
   (...)
     74 
     75     """
---> 77     fetch_api = cls.cloud_api_class().from_s3_cache(
     78 cache_dir,
     79 bucket_name=(
     80 bucket_name_overrideifbucket_name_overrideisnotNone
     81 elsecls.BUCKET_NAME),
     82 project_name=cls.PROJECT_NAME,
     83 ui_class_name=cls.__name__)
     85     return cls(fetch_api=fetch_api)

File /opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/allensdk/brain_observatory/behavior/behavior_project_cache/project_apis/data_io/project_cloud_api_base.py:108, in ProjectCloudApiBase.from_s3_cache(cls, cache_dir, bucket_name, project_name, ui_class_name)
     78 @classmethod
     79 def from_s3_cache(cls, cache_dir: Union[str, Path],
     80                   bucket_name: str,
     81                   project_name: str,
     82                   ui_class_name: str) -> "ProjectCloudApiBase":
     83     """instantiates this object with a connection to an s3 bucket and/or
     84     a local cache related to that bucket.
     85 
   (...)
    106 
    107     """
--> 108     cache = S3CloudCache(cache_dir,
    109 bucket_name,
    110 project_name,
    111 ui_class_name=ui_class_name)
    112     return cls(cache)

File /opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/allensdk/api/cloud_cache/cloud_cache.py:1066, in S3CloudCache.__init__(self, cache_dir, bucket_name, project_name, ui_class_name)
   1063 self._manifest = None
   1064 self._bucket_name = bucket_name
-> 1066 super().__init__(cache_dir=cache_dir,project_name=project_name,
   1067 ui_class_name=ui_class_name)

File /opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/allensdk/api/cloud_cache/cloud_cache.py:391, in CloudCacheBase.__init__(self, cache_dir, project_name, ui_class_name)
    390 def __init__(self, cache_dir, project_name, ui_class_name=None):
--> 391     super().__init__(cache_dir=cache_dir,project_name=project_name,
    392 ui_class_name=ui_class_name)
    394     # what latest_manifest was the last time an OutdatedManifestWarning
    395     # was emitted
    396     self._manifest_last_warned_on = None

File /opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/allensdk/api/cloud_cache/cloud_cache.py:63, in BasicLocalCache.__init__(self, cache_dir, project_name, ui_class_name)
     57 def __init__(
     58     self,
     59     cache_dir: Union[str, Path],
     60     project_name: str,
     61     ui_class_name: Optional[str] = None
     62 ):
---> 63     os.makedirs(cache_dir,exist_ok=True)
     65     # the class users are actually interacting with
     66     # (for warning message purposes)
     67     if ui_class_name is None:

File /opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/os.py:2[25](https://github.com/NeuromatchAcademy/course-content/actions/runs/9840571181/job/27165119130?pr=1116#step:10:26), in makedirs(name, mode, exist_ok)
    223         return
    224 try:
--> 225     mkdir(name,mode)
    2[26](https://github.com/NeuromatchAcademy/course-content/actions/runs/9840571181/job/27165119130?pr=1116#step:10:27) except OSError:
    2[27](https://github.com/NeuromatchAcademy/course-content/actions/runs/9840571181/job/27165119130?pr=1116#step:10:28)     # Cannot rely on checking for EEXIST, since the operating system
    228     # could give priority to other errors like EACCES or EROFS
    229     if not exist_ok or not path.isdir(name):

PermissionError: [Errno 13] Permission denied: '/temp'```
iamzoltan commented 2 months ago

This can probably be fixed by using a directory relative to the current directory for /temp

steevelaquitaine commented 2 months ago

If I understand that should be fixable with an os.mkdir("/temp") before the following lines:

data_storage_directory = "/temp" # Note: this path must exist on your local drive cache = VisualBehaviorOphysProjectCache.from_s3_cache(cache_dir=data_storage_directory)

Can you check @yavorska-iryna? Thanks a lot in advance.

yavorska-iryna commented 2 months ago

We have not seen this issue when we ran the notebook. Steeve's fix may work. I need to test it.

yavorska-iryna commented 2 months ago

@iamzoltan I dont run into the same permission error when I test the notebook in Jupyter lab or colab. I changed the name of the folder and the code still ran, so I dont think os.makedir would fix it. Since I can't replicate this error, it's hard for me to fix it.

iamzoltan commented 2 months ago

Yes I understand. We are talking about the processing environment on GH. Can you change the location to ./temp and lets see if that fixes the issue

yavorska-iryna commented 2 months ago

Let me look into it.

yavorska-iryna commented 2 months ago

@iamzoltan I changed the path to "./temp" and merged it to my forked branch. Let me know if that works.

matchings commented 2 months ago

I just tested the notebook on the allen-hot-fix branch with the following change to the cache loading and it worked as expected:

data_storage_directory = "./temp" cache = VisualBehaviorOphysProjectCache.from_s3_cache(cache_dir=data_storage_directory)

A folder called 'temp' was created in the same directory as the notebook and the cache loaded properly.

iamzoltan commented 2 months ago

Looks like the process is failing after adding in the fMRI fixes

iamzoltan commented 2 months ago

actually I cant get the allen books to run locally. I get this error:

   41 # query on valid_roi if exclude_invalid_rois == True
     42 if exclude_invalid_rois:
---> 43     cell_specimen_table = ophys_experiment.cell_specimen_table.query('valid_roi').reset_index()  # noqa E501
     44 else:
     45     cell_specimen_table = ophys_experiment.cell_specimen_table.reset_index()  # noqa E501

Cell In[16], line 20, in <lambda>(self, expr, **kwargs)
     18 pd.set_option('display.max_columns', 500)
     19 # this line may be needed if you run into Error in pandas query function
---> 20 pd.DataFrame.query = lambda self, expr, **kwargs: self.query(expr, engine='python', **kwargs) 

Cell In[16], line 20, in <lambda>(self, expr, **kwargs)
     18 pd.set_option('display.max_columns', 500)
     19 # this line may be needed if you run into Error in pandas query function
---> 20 pd.DataFrame.query = lambda self, expr, **kwargs: self.query(expr, engine='python', **kwargs) 

TypeError: __main__.<lambda>() got multiple values for keyword argument 'engine'
yavorska-iryna commented 2 months ago

@iamzoltan this line can be removed: pd.DataFrame.query = lambda self, expr, **kwargs: self.query(expr, engine='python', **kwargs) It was added because in some instances pandas used numpy engine and query function didn't work. It can also be fixed by adding engine='python' when using pd.query.

iamzoltan commented 2 months ago

I am sorting it out. now we are running out of space, which is a good error. I will clear some space, and try again.