dingo-gw / dingo

Dingo: Deep inference for gravitational-wave observations
MIT License
55 stars 19 forks source link

Fixing dingo_pipe to be compatible with bilby_pipe==1.3 #234

Closed nihargupte-ph closed 9 months ago

nihargupte-ph commented 10 months ago

The newest MR of bilby_pipe: https://git.ligo.org/lscsoft/bilby_pipe/-/merge_requests/580/diffs?commit_id=fec7dd7f4df06c2fd2686042fd97ab7773f198af has broken the current version of dingo_pipe. This PR modifies dingo_pipe to be consistent with bilby_pipe==1.3. This would involve adding extra arguments which are required by the new version of bilby_pipe.

nihargupte-ph commented 10 months ago

To run we need to apply the following steps (Found also by @hectorestelles):

condor_vault_storer -v "igwn"
htgettoken -a vault.ligo.org -i igwn
kinit yourligo.user@LIGO.ORG
export GWDATAFIND_SERVER=https://datafind.igwn.org/

However, note for the last step we need to actually specify this in a different way in the config.ini file. In particular on your original ini file make sure to add:

environment-variables={GWDATAFIND_SERVER : datafind.igwn.org}

We also need to download LDAS-framecpp

condor_submit_dag outdir_S230805x/submit/dag_S230805x.submit

We could add this to the tutorials, but I'm not sure it would apply to people who are not in the LVK. We need to make a ROTA settings file at some point so this could probably go in there instead.

nihargupte-ph commented 10 months ago

Currently this will only work if you specify --local-generation. The kerebros creditials are not being transferred properly to the node which does the data_generation

nihargupte-ph commented 9 months ago

Thanks Nihar, I only suggest one minor change. I assume you have tested that this works?

I also made a change to pyproject.toml to require bilby_pipe>=1.3.

Right, it seems to work on CIT. The one weird thing I notice that the data generation complains a bit in the error files a few times before downloading the data. Something like

03:13 dingo_pipe INFO    : Running: gwpy.timeseries.TimeSeries.get(channel='L1:GDS-CALIB_STRAIN_CLEAN', start=1379195844.025879, end=1379195852.025879, verbose=False, allow_tape=True, ).astype(dtype='float64', subok=True, copy=False, )
/home/nihar.gupte/environments/envs/dingo-devel/lib/python3.9/site-packages/gwpy/io/datafind.py:400: UserWarning: failed to read channels for type 'L1_HOFT_C00': Unable to open file: /cvmfs/ligo.storage.igwn.org/igwn/ligo/frames/O4/hoft_C00/L1/L-L1_HOFT_C00-137/L-L1_HOFT_C00-1379192832-4096.gwf ( errno=13 (Permission denied)):
  warnings.warn(
/home/nihar.gupte/environments/envs/dingo-devel/lib/python3.9/site-packages/gwpy/io/datafind.py:400: UserWarning: failed to read channels for type 'L1_HOFT_C00_AR': Unable to open file: /cvmfs/ligo.storage.igwn.org/igwn/ligo/frames/O4/hoft_C00_AR/L1/L-L1_HOFT_C00_AR-137/L-L1_HOFT_C00_AR-1379192832-4096.gwf ( errno=13 (Permission denied)):
  warnings.warn(
Error in write(): Connection refused
Error in write(): Connection refused
/home/nihar.gupte/environments/envs/dingo-devel/lib/python3.9/site-packages/gwpy/timeseries/core.py:1142: NDSWarning: failed to fetch data for L1:GDS-CALIB_STRAIN_CLEAN in interval [1379195844.025879, 1379195852.025879): Failed to establish a connection[INFO: Error occurred trying to write to socket]
  warnings.warn(
03:13 dingo_pipe INFO    : Resampling data to sampling_frequency 4096.0 using lal
03:13 dingo_pipe INFO    : Using default PSD start time -256.0 relative to start time
03:13 dingo_pipe INFO    : Completed data generation

but it seems to generate the data in the end so I think it's ok. I am now running inference on another event just to check the changes above. It was able to successfully download data, now it is sampling.

stephengreen commented 9 months ago

Thanks Nihar, I only suggest one minor change. I assume you have tested that this works? I also made a change to pyproject.toml to require bilby_pipe>=1.3.

Right, it seems to work on CIT. The one weird thing I notice that the data generation complains a bit in the error files a few times before downloading the data. Something like

03:13 dingo_pipe INFO    : Running: gwpy.timeseries.TimeSeries.get(channel='L1:GDS-CALIB_STRAIN_CLEAN', start=1379195844.025879, end=1379195852.025879, verbose=False, allow_tape=True, ).astype(dtype='float64', subok=True, copy=False, )
/home/nihar.gupte/environments/envs/dingo-devel/lib/python3.9/site-packages/gwpy/io/datafind.py:400: UserWarning: failed to read channels for type 'L1_HOFT_C00': Unable to open file: /cvmfs/ligo.storage.igwn.org/igwn/ligo/frames/O4/hoft_C00/L1/L-L1_HOFT_C00-137/L-L1_HOFT_C00-1379192832-4096.gwf ( errno=13 (Permission denied)):
  warnings.warn(
/home/nihar.gupte/environments/envs/dingo-devel/lib/python3.9/site-packages/gwpy/io/datafind.py:400: UserWarning: failed to read channels for type 'L1_HOFT_C00_AR': Unable to open file: /cvmfs/ligo.storage.igwn.org/igwn/ligo/frames/O4/hoft_C00_AR/L1/L-L1_HOFT_C00_AR-137/L-L1_HOFT_C00_AR-1379192832-4096.gwf ( errno=13 (Permission denied)):
  warnings.warn(
Error in write(): Connection refused
Error in write(): Connection refused
/home/nihar.gupte/environments/envs/dingo-devel/lib/python3.9/site-packages/gwpy/timeseries/core.py:1142: NDSWarning: failed to fetch data for L1:GDS-CALIB_STRAIN_CLEAN in interval [1379195844.025879, 1379195852.025879): Failed to establish a connection[INFO: Error occurred trying to write to socket]
  warnings.warn(
03:13 dingo_pipe INFO    : Resampling data to sampling_frequency 4096.0 using lal
03:13 dingo_pipe INFO    : Using default PSD start time -256.0 relative to start time
03:13 dingo_pipe INFO    : Completed data generation

but it seems to generate the data in the end so I think it's ok. I am now running inference on another event just to check the changes above. It was able to successfully download data, now it is sampling.

This is maybe a question for the bilby_pipe developers.

nihargupte-ph commented 9 months ago

Did you check also with GWOSC data?

Checking now, there seems to be an error with the dag but only with the older version of condor. There is this extra line

ENV GET HTGETTOKENOPTS

in the dag which is there. If I remove this line I can download data just fine from GWOSC. But if I leave it there I get an error. This only happens for the AEI cluster. On the CIT cluster though, having this extra line doesn't cause any issues and I am able to donwload data just fine. The error is of the form

2023-12-11T13:27:44 outdir_GW150914_2/submit/dag_GW150914.submit (line 18): ERROR: expected JOB, DATA, SUBDAG, FINAL, SCRIPT, PARENT, RETRY, ABORT-DAG-ON, DOT, VARS, PRIORITY, CATEGORY, MAXJOBS, CONFIG, SET_JOB_ATTR, SPLICE, PROVISIONER, SERVICE, NODE_STATUS_FILE, REJECT, JOBSTATE_LOG, PRE_SKIP, DONE, CONNECT, PIN_IN, PIN_OUT, INCLUDE or SUBMIT-DESCRIPTION token (found ENV)

implying that this is due to the older version of condor on the AEI cluster not knowing the ENV command.

stephengreen commented 9 months ago

implying that this is due to the older version of condor on the AEI cluster not knowing the ENV command.

Okay, but if you use local = True then it runs presumably?

If you are happy then please merge the PR.

nihargupte-ph commented 9 months ago

implying that this is due to the older version of condor on the AEI cluster not knowing the ENV command.

Okay, but if you use local = True then it runs presumably?

If you are happy then please merge the PR.

Right, with local=True the data gets downloaded.

I can merge, but it does mean we have an incompatibility with CondorVersion: 10.0.9 2023-09-28. I could open another PR which would fix that issue, but I'd rather it work on CIT than the AEI cluster. Even though I run on the AEI cluster if people want to reproduce results they would presumably be running on CIT so I think this makes the most sense.