NREL / buildstockbatch

Other
22 stars 14 forks source link

Can't run sample simulation on bsb v2023.01.0 #360

Closed phgupta closed 7 months ago

phgupta commented 1 year ago

Describe the bug I recently downloaded the latest version of buildstockbatch (v2023.01.0) and resstock (v3.0) and I'm having trouble running a sample simulation on AWS.

The simulation always errors out at fs.put(...) line in aws.py in AWS.

To Reproduce

  1. This is the yaml file I'm currently using,
    
    schema_version: '0.3'
    buildstock_directory: ../resstock-3.0.0  # Relative to this file or absolute
    project_directory: project_national  # Relative to buildstock_directory
    output_directory: bsb_results
    # weather_files_url: https://data.nrel.gov/system/files/156/BuildStock_TMY3_FIPS.zip
    weather_files_path: weather/BuildStock_TMY3_FIPS.zip  # Relative to this file or absolute path to zipped weather files
    baseline:
    n_buildings_represented: 81221016

sampler: type: precomputed args: sample_file: precomputed_files/buildstock_10.csv

workflow_generator: type: residential_hpxml args: build_existing_model: simulation_control_run_period_calendar_year: 2010 simulation_output_report: timeseries_frequency: hourly

upgrades:

aws:

The job_identifier should be unique, start with alpha, not include dashes, and limited to 10 chars or data loss can occur

job_identifier: test_proj s3: bucket: xyz prefix: demo/example2 emr: worker_instance_count: 1 region: us-west-2 use_spot: false batch_array_size: 2

To receive email updates on job progress accept the request to receive emails that will be sent from Amazon

notifications_email: abc@xyz.com

postprocessing: aws: region_name: 'us-west-2' s3: bucket: xyz prefix: demo/example2 athena: glue_service_role: service-role/AWSGlueServiceRole-default database_name: testing max_crawling_time: 300 #time to wait for the crawler to complete before aborting it


**Logs**
These are the logs from AWS Batch (NOTE: I have added a couple of log statements),
2023-03-27T18:38:17.601-04:00 | INFO:2023-03-27 18:38:17:__main__:Archiving simulation outputs -- | --   | 2023-03-27T18:38:17.604-04:00 | DEBUG:2023-03-27 18:38:17:__main__:Clearing out simulation directory   | 2023-03-27T18:38:17.605-04:00 | DEBUG:2023-03-27 18:38:17:__main__:sim_dir: /var/simdata/openstudio   | 2023-03-27T18:38:17.605-04:00 | DEBUG:2023-03-27 18:38:17:__main__:asset_dirs: ['resources', 'measures', 'weather', 'lib']   | 2023-03-27T18:38:17.606-04:00 | DEBUG:2023-03-27 18:38:17:__main__:s3fs version: 2023.3.0   | 2023-03-27T18:38:17.606-04:00 | DEBUG:2023-03-27 18:38:17:__main__:simulation_output_tar_filename: /var/simdata/simulation_outputs.tar.gz   | 2023-03-27T18:38:17.606-04:00 | DEBUG:2023-03-27 18:38:17:__main__:bucket: eshu-icf2   | 2023-03-27T18:38:17.606-04:00 | DEBUG:2023-03-27 18:38:17:__main__:prefix: demo/example2   | 2023-03-27T18:38:17.606-04:00 | DEBUG:2023-03-27 18:38:17:__main__:job_id: 0   | 2023-03-27T18:38:18.476-04:00 | /usr/local/lib/python3.8/site-packages/botocore/utils.py:1720: FutureWarning: The S3RegionRedirector class has been deprecated for a new internal replacement. A future version of botocore may remove this class.   | 2023-03-27T18:38:18.476-04:00 | warnings.warn(   | 2023-03-27T18:38:18.476-04:00 | Traceback (most recent call last):   | 2023-03-27T18:38:18.476-04:00 | File "/usr/local/lib/python3.8/site-packages/s3fs/core.py", line 112, in _error_wrapper   | 2023-03-27T18:38:18.476-04:00 | return await func(*args, **kwargs)   | 2023-03-27T18:38:18.476-04:00 | File "/usr/local/lib/python3.8/site-packages/aiobotocore/client.py", line 358, in _make_api_call   | 2023-03-27T18:38:18.476-04:00 | raise error_class(parsed_response, operation_name)   | 2023-03-27T18:38:18.476-04:00 | botocore.exceptions.ClientError: An error occurred (AllAccessDisabled) when calling the PutObject operation: All access to this object has been disabled   | 2023-03-27T18:38:18.476-04:00 | The above exception was the direct cause of the following exception:   | 2023-03-27T18:38:18.476-04:00 | Traceback (most recent call last):   | 2023-03-27T18:38:18.476-04:00 | File "/usr/local/lib/python3.8/runpy.py", line 194, in _run_module_as_main   | 2023-03-27T18:38:18.476-04:00 | return _run_code(code, main_globals, None,   | 2023-03-27T18:38:18.476-04:00 | File "/usr/local/lib/python3.8/runpy.py", line 87, in _run_code   | 2023-03-27T18:38:18.476-04:00 | exec(code, run_globals)   | 2023-03-27T18:38:18.476-04:00 | File "/usr/local/lib/python3.8/site-packages/buildstockbatch/aws/aws.py", line 2207, in   | 2023-03-27T18:38:18.476-04:00 | main()   | 2023-03-27T18:38:18.476-04:00 | File "/usr/local/lib/python3.8/site-packages/buildstockbatch/utils.py", line 98, in run_with_error_capture   | 2023-03-27T18:38:18.476-04:00 | return func(*args, **kwargs)   | 2023-03-27T18:38:18.476-04:00 | File "/usr/local/lib/python3.8/site-packages/buildstockbatch/aws/aws.py", line 2176, in main   | 2023-03-27T18:38:18.476-04:00 | AwsBatch.run_job(job_id, s3_bucket, s3_prefix, job_name, region)   | 2023-03-27T18:38:18.476-04:00 | File "/usr/local/lib/python3.8/site-packages/buildstockbatch/aws/aws.py", line 2117, in run_job   | 2023-03-27T18:38:18.476-04:00 | fs.put(   | 2023-03-27T18:38:18.476-04:00 | File "/usr/local/lib/python3.8/site-packages/fsspec/asyn.py", line 115, in wrapper   | 2023-03-27T18:38:18.476-04:00 | return sync(self.loop, func, *args, **kwargs)   | 2023-03-27T18:38:18.476-04:00 | File "/usr/local/lib/python3.8/site-packages/fsspec/asyn.py", line 100, in sync   | 2023-03-27T18:38:18.476-04:00 | raise return_result   | 2023-03-27T18:38:18.476-04:00 | File "/usr/local/lib/python3.8/site-packages/fsspec/asyn.py", line 55, in _runner   | 2023-03-27T18:38:18.476-04:00 | result[0] = await coro   | 2023-03-27T18:38:18.476-04:00 | File "/usr/local/lib/python3.8/site-packages/fsspec/asyn.py", line 514, in _put   | 2023-03-27T18:38:18.476-04:00 | return await _run_coros_in_chunks(   | 2023-03-27T18:38:18.476-04:00 | File "/usr/local/lib/python3.8/site-packages/fsspec/asyn.py", line 246, in _run_coros_in_chunks   | 2023-03-27T18:38:18.476-04:00 | await asyncio.gather(*chunk, return_exceptions=return_exceptions),   | 2023-03-27T18:38:18.477-04:00 | File "/usr/local/lib/python3.8/asyncio/tasks.py", line 455, in wait_for   | 2023-03-27T18:38:18.477-04:00 | return await fut   | 2023-03-27T18:38:18.477-04:00 | File "/usr/local/lib/python3.8/site-packages/s3fs/core.py", line 1110, in _put_file   | 2023-03-27T18:38:18.477-04:00 | await self._call_s3(   | 2023-03-27T18:38:18.477-04:00 | File "/usr/local/lib/python3.8/site-packages/s3fs/core.py", line 347, in _call_s3   | 2023-03-27T18:38:18.477-04:00 | return await _error_wrapper(   | 2023-03-27T18:38:18.477-04:00 | File "/usr/local/lib/python3.8/site-packages/s3fs/core.py", line 139, in _error_wrapper   | 2023-03-27T18:38:18.477-04:00 | raise err   | 2023-03-27T18:38:18.477-04:00 | PermissionError: All access to this object has been disabled

This is the line that's giving the error in aws.py,
    fs.put(
        str(simulation_output_tar_filename),
        f'{bucket}/{prefix}/results/simulation_output/simulations_job{job_id}.tar.gz'
    )


This is not a Permission issue since a new folder is getting created in S3 with assets.tar.gz, config.json, emr/, etc. (no results folder though). If it was a permission issue, it should have failed right at the beginning. 

Also, I ran the above ```fs.put(...)``` line locally (with python3.8 and s3fs v2023.3.0) and I was able to successfully transfer a local file to the correct S3 bucket.

**Platform (please complete the following information):**
- Simulation platform: AWS
- BuildStockBatch version, branch, or sha: v2023.01.0
- resstock or comstock repo version, branch, or sha: v3.0
- Local Desktop OS: Windows WSL2
phgupta commented 1 year ago

I fixed the issue by updating the install_requires variable in setup.py with the one in buildstockbatch v0.21.

nmerket commented 1 year ago

There are a lot of things broken in the AWS batch workflow right now. We're working on it in #345 as we have time.

nmerket commented 7 months ago

We got the workflow working (although it's still very beta) in #345. It has been merged down to develop. Go ahead and give that a go.