cancervariants / gene-normalization

Services and guidelines for normalizing genes
https://gene-normalizer.readthedocs.io/latest/
MIT License
1 stars 3 forks source link

Download issues for NCBI history data #321

Closed jsstevenson closed 3 months ago

jsstevenson commented 9 months ago

Might be something related to on-prem network and FTP usage:

Loading NCBI...                                                                                                                                                                                                                 
Traceback (most recent call last):                                                                                                                                                                                              
  File "/Users/jss009/code/gene-normalizer/venv/bin/gene_norm_update", line 8, in <module>                                                                                                                                      
    sys.exit(update_normalizer_db())                                                                                                                                                                                            
             ^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                             
  File "/Users/jss009/code/gene-normalizer/venv/lib/python3.11/site-packages/click/core.py", line 1157, in __call__                                                                                                             
    return self.main(*args, **kwargs)                                                                                                                                                                                           
           ^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                           
  File "/Users/jss009/code/gene-normalizer/venv/lib/python3.11/site-packages/click/core.py", line 1078, in main                                                                                                                 
    rv = self.invoke(ctx)                                                                                                                                                                                                       
         ^^^^^^^^^^^^^^^^                                                                                                                                                                                                       
  File "/Users/jss009/code/gene-normalizer/venv/lib/python3.11/site-packages/click/core.py", line 1434, in invoke                                                                                                               
    return ctx.invoke(self.callback, **ctx.params)                                                                                                                                                                              
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                              
  File "/Users/jss009/code/gene-normalizer/venv/lib/python3.11/site-packages/click/core.py", line 783, in invoke                                                                                                                
    return __callback(*args, **kwargs)                                                                                                                                                                                          
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                          
  File "/Users/jss009/code/gene-normalizer/src/gene/cli.py", line 292, in update_normalizer_db                                                                                                                                  
    _update_normalizer(list(SourceName), db, update_merged, use_existing)                                                                                                                                                       
  File "/Users/jss009/code/gene-normalizer/src/gene/cli.py", line 128, in _update_normalizer                                                                                                                                    
    _load_source(n, db, delete_time, processed_ids, use_existing)                                                                                                                                                               
  File "/Users/jss009/code/gene-normalizer/src/gene/cli.py", line 191, in _load_source                                                                                                                                          
    processed_ids += source.perform_etl(use_existing)                                                                                                                                                                           
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                           
  File "/Users/jss009/code/gene-normalizer/src/gene/etl/base.py", line 60, in perform_etl                                                                                                                                       
    self._extract_data(use_existing)                                                                                                                                                                                            
  File "/Users/jss009/code/gene-normalizer/src/gene/etl/ncbi.py", line 227, in _extract_data                                                                                                                                    
    self._history_src = self._acquire_data_file(                                                                                                                                                                                
                        ^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                
  File "/Users/jss009/code/gene-normalizer/src/gene/etl/base.py", line 105, in _acquire_data_file                                                                                                                               
    return download_callback()                                                                                                                                                                                                  
           ^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                                  
  File "/Users/jss009/code/gene-normalizer/src/gene/etl/ncbi.py", line 168, in _download_history_file                                                                                                                           
    version = self._ftp_download(                                                                                                                                                                                               
              ^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                               
  File "/Users/jss009/code/gene-normalizer/src/gene/etl/base.py", line 163, in _ftp_download                                                                                                                                    
    self._ftp_download_file(ftp, data_fn, source_dir, fn)                                                                                                                                                                       
  File "/Users/jss009/code/gene-normalizer/src/gene/etl/base.py", line 185, in _ftp_download_file                                                                                                                               
    shutil.copyfileobj(f_in, f_out)                                                                                                                                                                                             
  File "/Users/jss009/.pyenv/versions/3.11.0/lib/python3.11/shutil.py", line 197, in copyfileobj                                                                                                                                
    buf = fsrc_read(length)                                                                                                                                                                                                     
          ^^^^^^^^^^^^^^^^^                                                                                                                                                                                                     
  File "/Users/jss009/.pyenv/versions/3.11.0/lib/python3.11/gzip.py", line 301, in read                                                                                                                                         
    return self._buffer.read(size)                                                                                                                                                                                              
           ^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                              
  File "/Users/jss009/.pyenv/versions/3.11.0/lib/python3.11/_compression.py", line 68, in readinto                                                                                                                              
    data = self.read(len(byte_view))                                                                                                                                                                                            
           ^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                            
  File "/Users/jss009/.pyenv/versions/3.11.0/lib/python3.11/gzip.py", line 507, in read                                                                                                                                         
    uncompress = self._decompressor.decompress(buf, size)                                                                                                                                                                       
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                       
zlib.error: Error -3 while decompressing data: invalid block type

Additionally, getting a different error from --use_existing, looks like a string formatting issue

Loading NCBI...                                         
Traceback (most recent call last):                                                                              
  File "/Users/jss009/code/gene-normalizer/venv/bin/gene_norm_update", line 8, in <module>                                                                                                                                      
    sys.exit(update_normalizer_db())                                                                            
             ^^^^^^^^^^^^^^^^^^^^^^                                                                             
  File "/Users/jss009/code/gene-normalizer/venv/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)                                                                           
           ^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                           
  File "/Users/jss009/code/gene-normalizer/venv/lib/python3.11/site-packages/click/core.py", line 1078, in main                                                                                                                 
    rv = self.invoke(ctx)                               
         ^^^^^^^^^^^^^^^^                               
  File "/Users/jss009/code/gene-normalizer/venv/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)                                                              
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                              
  File "/Users/jss009/code/gene-normalizer/venv/lib/python3.11/site-packages/click/core.py", line 783, in invoke 
    return __callback(*args, **kwargs)                                                                          
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                          
  File "/Users/jss009/code/gene-normalizer/src/gene/cli.py", line 315, in update_normalizer_db                                                                                                                                  
    _update_normalizer(parsed_source_names, db, update_merged, use_existing)                                                                                                                                                    
  File "/Users/jss009/code/gene-normalizer/src/gene/cli.py", line 128, in _update_normalizer                                                                                                                                    
    _load_source(n, db, delete_time, processed_ids, use_existing)                                               
  File "/Users/jss009/code/gene-normalizer/src/gene/cli.py", line 191, in _load_source                                                                                                                                          
    processed_ids += source.perform_etl(use_existing)                                                           
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                           
  File "/Users/jss009/code/gene-normalizer/src/gene/etl/base.py", line 60, in perform_etl                                                                                                                                       
    self._extract_data(use_existing)                                                                            
  File "/Users/jss009/code/gene-normalizer/src/gene/etl/ncbi.py", line 227, in _extract_data                                                                                                                                    
    self._history_src = self._acquire_data_file(                                                                
                        ^^^^^^^^^^^^^^^^^^^^^^^^                                                                
  File "/Users/jss009/code/gene-normalizer/src/gene/etl/base.py", line 101, in _acquire_data_file                                                                                                                               
    raise FileNotFoundError(                            
FileNotFoundError: No local files matching pattern file:///Users/jss009/code/gene-normalizer/src/gene/data/ncbincbi_history_20240128.tsv
korikuzma commented 9 months ago

An external collaborator from dana farber also experienced this

jsstevenson commented 9 months ago

Update, am trying again both from home and via VPN and everything seems to be working again. I am wondering if this is an NCBI FTP issue (accessing via HTTPS was working for me earlier when FTP was yielding corrupted downloads) so I've created https://github.com/GenomicMedLab/wags-tails/issues/25

github-actions[bot] commented 3 months ago

This issue is stale because it has been open 90 days with no activity. This issue will be closed if no further activity occurs in 14 days.

github-actions[bot] commented 3 months ago

This issue was closed because it has been stalled for 14 days with no activity.