Closed GernotMaier closed 2 years ago
This happens without using the DB right?
'useMongoDB: true' , so that should not be the issue.
I really struggle setting all the values in config.yml correctly, maybe we can improve the documentation and the error messages.
e.g., instead of writing:
ERROR::config(l84)::get::Config does not contain dataLocation
Write:
ERROR::config(l84)::get::Config keyword dataLocation not found in file <full path to file>
It took me a while to find out that dataLocation is a required keyword (for some reasons I had testdataLocation in my config.yaml file), and then I did not know from which directory the config.yml file was read.
Note that the full error message here is very confusing:
python gammasim-tools/applications/derive_mirror_rnda.py --site North --telescope MST-FlashCam-D --mean_d80 1.4 --sig_d80 0.16 --mirror_list mirror_MST_focal_lengths.dat --d80_list mirror_MST_D80.dat --rnda 0.0075
maierg@warp.zeuthen.desy.de's password:
ERROR::config(l84)::get::Config does not contain dataLocation
Traceback (most recent call last):
File "gammasim-tools/applications/derive_mirror_rnda.py", line 305, in <module>
meanD80, sigD80 = run(rndaStart)
File "gammasim-tools/applications/derive_mirror_rnda.py", line 235, in run
ray = RayTracing.fromKwargs(
File "/workdir/external/gammasim-tools/simtools/ray_tracing.py", line 198, in fromKwargs
return cls(**args, configData=configData)
File "/workdir/external/gammasim-tools/simtools/ray_tracing.py", line 126, in __init__
_parameterFile = io.getDataFile("parameters", "ray-tracing_parameters.yml")
File "/workdir/external/gammasim-tools/simtools/io_handler.py", line 187, in getDataFile
Path(cfg.get("dataLocation")).joinpath(parentDir).joinpath(fileName).absolute()
File "/workdir/external/gammasim-tools/simtools/config.py", line 85, in get
raise KeyError()
KeyError
INFO::db_handler(l187)::_closeSSHTunnel::Closing SSH tunnel(s)
It doesn't find a keyword in a config.yml, but most of the error message is about a database connection and ssh issues.
Don't worry about the error message itself. I think all issues I have is that files are not found in the modelFilesLocations. This is probably an error on how I set it up on my side (and we need to get this stuff into the database).
I think I get it now: dataLocation needs to point to 'gammasim-tools/data' and it contains configuration files required to run gammasim-tools. That was completely unexpected!
As a user, I did not expect that anything else is needed when running e.g.,
python gammasim-tools/applications/derive_mirror_rnda.py --site North --telescope MST-FlashCam-D --mean_d80 1.4 --sig_d80 0.16 --mirror_list mirror_MST_focal_lengths.dat --d80_list mirror_MST_D80.dat --rnda 0.0075
This is quite a rich command line (nothing wrong with that), but it does not propose that a file name 'data/parameters/ray-tracing_parameters.yml' is required to run the application.
Why did it work for me a couple of months ago? I think the reason is that now I don't start the applications from the gammasim-tools directory but from somewhere else (which means the default paths are not set correctly).
I think I get it now: dataLocation needs to point to 'gammasim-tools/data' and it contains configuration files required to run gammasim-tools. That was completely unexpected!
The 'data' directory is need for many things. It is where we store many parameter files, layout files, test files etc.
I made it a config entry because we might want to move it somewhere in the future, but by now we keep it inside the main repo.
As a user, I did not expect that anything else is needed when running e.g.,
python gammasim-tools/applications/derive_mirror_rnda.py --site North --telescope MST-FlashCam-D --mean_d80 1.4 --sig_d80 0.16 --mirror_list mirror_MST_focal_lengths.dat --d80_list mirror_MST_D80.dat --rnda 0.0075
This is quite a rich command line (nothing wrong with that), but it does not propose that a file name 'data/parameters/ray-tracing_parameters.yml' is required to run the application.
It should not. This file is needed anytime one needs to run a ray tracing module.
Why did it work for me a couple of months ago? I think the reason is that now I don't start the applications from the gammasim-tools directory but from somewhere else (which means the default paths are not set correctly).
Yes, that is probably the reason. We could use the full path as the default for the data dir in the config file. I will open an issue about it.
Note that the full error message here is very confusing:
python gammasim-tools/applications/derive_mirror_rnda.py --site North --telescope MST-FlashCam-D --mean_d80 1.4 --sig_d80 0.16 --mirror_list mirror_MST_focal_lengths.dat --d80_list mirror_MST_D80.dat --rnda 0.0075 maierg@warp.zeuthen.desy.de's password: ERROR::config(l84)::get::Config does not contain dataLocation Traceback (most recent call last): File "gammasim-tools/applications/derive_mirror_rnda.py", line 305, in <module> meanD80, sigD80 = run(rndaStart) File "gammasim-tools/applications/derive_mirror_rnda.py", line 235, in run ray = RayTracing.fromKwargs( File "/workdir/external/gammasim-tools/simtools/ray_tracing.py", line 198, in fromKwargs return cls(**args, configData=configData) File "/workdir/external/gammasim-tools/simtools/ray_tracing.py", line 126, in __init__ _parameterFile = io.getDataFile("parameters", "ray-tracing_parameters.yml") File "/workdir/external/gammasim-tools/simtools/io_handler.py", line 187, in getDataFile Path(cfg.get("dataLocation")).joinpath(parentDir).joinpath(fileName).absolute() File "/workdir/external/gammasim-tools/simtools/config.py", line 85, in get raise KeyError() KeyError INFO::db_handler(l187)::_closeSSHTunnel::Closing SSH tunnel(s)
It doesn't find a keyword in a config.yml, but most of the error message is about a database connection and ssh issues.
The error is as clear as we can make it. It is a bit confusing because of python. The large block about KeyError is inevitable, and the one line about the DB is because we close the connection after the error. What we can do is to make the line "ERROR::config(l84)::get::Config does not contain dataLocation" a bit clearer as you suggested. I will create a issue for that and fix it next.
Is it working in the end, @GernotMaier ?
Is it working in the end, @GernotMaier ?
Yes, but it took me two hours. I think this shows you that the setup is ideal for users who are not very familiar with the system.
Note that the full error message here is very confusing:
python gammasim-tools/applications/derive_mirror_rnda.py --site North --telescope MST-FlashCam-D --mean_d80 1.4 --sig_d80 0.16 --mirror_list mirror_MST_focal_lengths.dat --d80_list mirror_MST_D80.dat --rnda 0.0075 maierg@warp.zeuthen.desy.de's password: ERROR::config(l84)::get::Config does not contain dataLocation Traceback (most recent call last): File "gammasim-tools/applications/derive_mirror_rnda.py", line 305, in <module> meanD80, sigD80 = run(rndaStart) File "gammasim-tools/applications/derive_mirror_rnda.py", line 235, in run ray = RayTracing.fromKwargs( File "/workdir/external/gammasim-tools/simtools/ray_tracing.py", line 198, in fromKwargs return cls(**args, configData=configData) File "/workdir/external/gammasim-tools/simtools/ray_tracing.py", line 126, in __init__ _parameterFile = io.getDataFile("parameters", "ray-tracing_parameters.yml") File "/workdir/external/gammasim-tools/simtools/io_handler.py", line 187, in getDataFile Path(cfg.get("dataLocation")).joinpath(parentDir).joinpath(fileName).absolute() File "/workdir/external/gammasim-tools/simtools/config.py", line 85, in get raise KeyError() KeyError INFO::db_handler(l187)::_closeSSHTunnel::Closing SSH tunnel(s)
It doesn't find a keyword in a config.yml, but most of the error message is about a database connection and ssh issues.
The error is as clear as we can make it. It is a bit confusing because of python. The large block about KeyError is inevitable, and the one line about the DB is because we close the connection after the error. What we can do is to make the line "ERROR::config(l84)::get::Config does not contain dataLocation" a bit clearer as you suggested. I will create a issue for that and fix it next.
Why can't the program exit in a clean way after there is a configuration file has not been found? This would be much clearer.
Note that the full error message here is very confusing:
python gammasim-tools/applications/derive_mirror_rnda.py --site North --telescope MST-FlashCam-D --mean_d80 1.4 --sig_d80 0.16 --mirror_list mirror_MST_focal_lengths.dat --d80_list mirror_MST_D80.dat --rnda 0.0075 maierg@warp.zeuthen.desy.de's password: ERROR::config(l84)::get::Config does not contain dataLocation Traceback (most recent call last): File "gammasim-tools/applications/derive_mirror_rnda.py", line 305, in <module> meanD80, sigD80 = run(rndaStart) File "gammasim-tools/applications/derive_mirror_rnda.py", line 235, in run ray = RayTracing.fromKwargs( File "/workdir/external/gammasim-tools/simtools/ray_tracing.py", line 198, in fromKwargs return cls(**args, configData=configData) File "/workdir/external/gammasim-tools/simtools/ray_tracing.py", line 126, in __init__ _parameterFile = io.getDataFile("parameters", "ray-tracing_parameters.yml") File "/workdir/external/gammasim-tools/simtools/io_handler.py", line 187, in getDataFile Path(cfg.get("dataLocation")).joinpath(parentDir).joinpath(fileName).absolute() File "/workdir/external/gammasim-tools/simtools/config.py", line 85, in get raise KeyError() KeyError INFO::db_handler(l187)::_closeSSHTunnel::Closing SSH tunnel(s)
It doesn't find a keyword in a config.yml, but most of the error message is about a database connection and ssh issues.
The error is as clear as we can make it. It is a bit confusing because of python. The large block about KeyError is inevitable, and the one line about the DB is because we close the connection after the error. What we can do is to make the line "ERROR::config(l84)::get::Config does not contain dataLocation" a bit clearer as you suggested. I will create a issue for that and fix it next.
Why can't the program exit in a clean way after there is a configuration file has not been found? This would be much clearer.
I will try to get rid of this big block about the KeyError, but I'm not sure whether that is possible. To me this message looks like a pretty common python error. And it is actually more clear than many other packages because we added the customized mesage about the config file, that is not hard to find. Other packages would choose to leave the error handling to python, which means there would be a standard KeyError message only.
And we can remove the SSH tunnel message if you find it distracting (make it a DEBUG level).
It simply doesn't make sense to have these type of error and confuse the users. I strongly suggest to have clear error messages, catch those errors early and exit without the python errors.
Error messages should be helpful. And here we don't really have an exception, it is a file not found, we can handle it and exit.
On the issue of the 'data' directory: I don't want to start a discussion about directory names, but I think it would have been clearer if it wouldn't be called data
but maybe configuration
?
Seeing the full output of this application, I get now the following error at the end.
Is the connection the DB lost when running an application which takes several minutes? This is running from home on my laptop.
INFO::ray_tracing(l227)::simulate::Simulating RayTracing for offAxis=0.0, mirror=170
INFO::simtel_runner(l173)::run::Running (1x) with command:/workdir/sim_telarray/sim_telarray/bin/sim_telarray -c /workdir/external/output/simtools-output/derive_mirror_rnda/model/CTA-North-MST-FlashCam-D-prod4_derive_mirror_rnda.cfg -I../cfg/CTA -C IMAGING_LIST=/workdir/external/output/simtools-output/derive_mirror_rnda/ray-tracing/photons-North-MST-FlashCam-D-d0.0-za20.0-off0.000_mirror170_derive_mirror_rnda.lis -C stars=/workdir/external/output/simtools-output/derive_mirror_rnda/ray-tracing/stars-North-MST-FlashCam-D-d0.0-za20.0-off0.000_mirror170_derive_mirror_rnda.lis -C altitude=2150.0 -C telescope_theta=20.0 -C star_photons=10000 -C telescope_phi=0 -C camera_transmission=1.0 -C nightsky_background=all:0. -C trigger_current_limit=1e10 -C telescope_random_angle=0 -C telescope_random_error=0 -C convergent_depth=0 -C maximum_telescopes=1 -C show=all -C camera_filter=none -C focus_offset=all:0. -C camera_config_file=single_pixel_camera.dat -C camera_pixels=1 -C trigger_pixels=1 -C camera_body_diameter=0 -C mirror_list=/workdir/external/output/simtools-output/derive_mirror_rnda/model/CTA-single-mirror-list-North-MST-FlashCam-D-prod4-mirror170_derive_mirror_rnda.dat -C focal_length=3214.0 -C dish_shape_length=1607.0 -C mirror_focal_length=1607.0 -C parabolic_dish=0 -C mirror_align_random_distance=0. -C mirror_align_random_vertical=0.,28.,0.,0. /workdir/sim_telarray/run9991.corsika.gz 2>&1 > /workdir/external/output/simtools-output/derive_mirror_rnda/ray-tracing/log-North-MST-FlashCam-D-d0.0-za20.0-off0.000_mirror170_derive_mirror_rnda.log 2>&1
INFO::ray_tracing(l227)::simulate::Simulating RayTracing for offAxis=0.0, mirror=171
client_loop: send disconnect: Broken pipe
Traceback (most recent call last):
File "/conda/envs/gammasim-tools-dev/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1514, in _retryable_read
server = self._select_server(
File "/conda/envs/gammasim-tools-dev/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1346, in _select_server
server = topology.select_server(server_selector)
File "/conda/envs/gammasim-tools-dev/lib/python3.8/site-packages/pymongo/topology.py", line 244, in select_server
return random.choice(self.select_servers(selector,
File "/conda/envs/gammasim-tools-dev/lib/python3.8/site-packages/pymongo/topology.py", line 202, in select_servers
server_descriptions = self._select_servers_loop(
File "/conda/envs/gammasim-tools-dev/lib/python3.8/site-packages/pymongo/topology.py", line 218, in _select_servers_loop
raise ServerSelectionTimeoutError(
pymongo.errors.ServerSelectionTimeoutError: localhost:27018: [Errno 111] Connection refused, Timeout: 30s, Topology Description: <TopologyDescription id: 6200d849ebe36888bd560be0, topology_type: Single, servers: [<ServerDescription ('localhost', 27018) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:27018: [Errno 111] Connection refused')>]>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "gammasim-tools/applications/derive_mirror_rnda.py", line 311, in <module>
meanD80, sigD80 = run(newRnda)
File "gammasim-tools/applications/derive_mirror_rnda.py", line 241, in run
ray.simulate(test=False, force=True) # force has to be True, always
File "/workdir/external/gammasim-tools/simtools/ray_tracing.py", line 232, in simulate
simtel = SimtelRunnerRayTracing(
File "/workdir/external/gammasim-tools/simtools/simtel/simtel_runner_ray_tracing.py", line 119, in __init__
self._loadRequiredFiles()
File "/workdir/external/gammasim-tools/simtools/simtel/simtel_runner_ray_tracing.py", line 154, in _loadRequiredFiles
"# configFile = {}\n".format(self.telescopeModel.getConfigFile())
File "/workdir/external/gammasim-tools/simtools/model/telescope_model.py", line 525, in getConfigFile
self.exportConfigFile()
File "/workdir/external/gammasim-tools/simtools/model/telescope_model.py", line 502, in exportConfigFile
self.exportModelFiles()
File "/workdir/external/gammasim-tools/simtools/model/telescope_model.py", line 496, in exportModelFiles
db.exportModelFiles(parsFromDB, self._configFileDirectory)
File "/workdir/external/gammasim-tools/simtools/db_handler.py", line 265, in exportModelFiles
self._writeFileFromMongoToDisk(
File "/workdir/external/gammasim-tools/simtools/db_handler.py", line 685, in _writeFileFromMongoToDisk
fsOutput.download_to_stream_by_name(file.filename, outputFile)
File "/conda/envs/gammasim-tools-dev/lib/python3.8/site-packages/gridfs/__init__.py", line 910, in download_to_stream_by_name
for chunk in gout:
File "/conda/envs/gammasim-tools-dev/lib/python3.8/site-packages/gridfs/grid_file.py", line 802, in next
chunk = self.__chunk_iter.next()
File "/conda/envs/gammasim-tools-dev/lib/python3.8/site-packages/gridfs/grid_file.py", line 755, in next
chunk = self._next_with_retry()
File "/conda/envs/gammasim-tools-dev/lib/python3.8/site-packages/gridfs/grid_file.py", line 747, in _next_with_retry
return self._cursor.next()
File "/conda/envs/gammasim-tools-dev/lib/python3.8/site-packages/pymongo/cursor.py", line 1238, in next
if len(self.__data) or self._refresh():
File "/conda/envs/gammasim-tools-dev/lib/python3.8/site-packages/pymongo/cursor.py", line 1155, in _refresh
self.__send_message(q)
File "/conda/envs/gammasim-tools-dev/lib/python3.8/site-packages/pymongo/cursor.py", line 1044, in __send_message
response = client._run_operation(
File "/conda/envs/gammasim-tools-dev/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1424, in _run_operation
return self._retryable_read(
File "/conda/envs/gammasim-tools-dev/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1531, in _retryable_read
raise last_error
File "/conda/envs/gammasim-tools-dev/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1525, in _retryable_read
return func(session, server, sock_info, secondary_ok)
File "/conda/envs/gammasim-tools-dev/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1420, in _cmd
return server.run_operation(
File "/conda/envs/gammasim-tools-dev/lib/python3.8/site-packages/pymongo/server.py", line 114, in run_operation
reply = sock_info.receive_message(request_id)
File "/conda/envs/gammasim-tools-dev/lib/python3.8/site-packages/pymongo/pool.py", line 753, in receive_message
self._raise_connection_failure(error)
File "/conda/envs/gammasim-tools-dev/lib/python3.8/site-packages/pymongo/pool.py", line 751, in receive_message
return receive_message(self, request_id, self.max_message_size)
File "/conda/envs/gammasim-tools-dev/lib/python3.8/site-packages/pymongo/network.py", line 215, in receive_message
data = _receive_data_on_socket(sock_info, length - 16, deadline)
File "/conda/envs/gammasim-tools-dev/lib/python3.8/site-packages/pymongo/network.py", line 293, in _receive_data_on_socket
raise AutoReconnect("connection closed")
pymongo.errors.AutoReconnect: connection closed
INFO::db_handler(l187)::_closeSSHTunnel::Closing SSH tunnel(s)
Interesting, never seen this issue. I wonder if we are closing the connection after the first mirror. Raul mentioned he found a bug though, so let's wait and see if that is the issue or an actual connection to the DB issue.
It is working normally with me, after I fixed a bug.
Let me check this on the WGS. At home on my laptop, it shows this error after a while. Main reason is probably some time out of the ssh connection.
If that's the case, I will have to modify the tunnel or DB connection parameters so that it doesn't happen. Let me know and I will try to recreate then.
The error message is
pymongo.errors.AutoReconnect: connection closed
Looking around, it seems worth trying to keep the connection alive, see e.g. here
It could also be an issue of trying to open a second connection when running the second mirror (instead of using the original connection). I will try to reproduce and debug.
Couldn't recreate, neither from the WGS nor from my laptop using the usual dev container. Running on the WGS took about 10 minutes, running on the laptop took significantly longer (1.5-2 hours). In neither case did I get this issue.
However, following your error message, I do see some strange behaviour in the code. If I understand correctly (@RaulRPrado can correct me if I am wrong), every time we run the ray tracing on one mirror, we read the model from the DB and export a config file. Not sure why we do that for every mirror. I assume the entire config file does not change but instead we change just one/two parameters in it. Why don't we export the file once and then, if necessary, edit the exported one prior to each run? Alternatively, we not export the model we already read once and have in memory (i.e., dict)?
I am pretty sure that reading from the DB for every run makes the execution much slower, especially when running from home. Should we open an issue to improve this behaviour?
Note that on your laptop, the kerberos ticketing is probably working (it stopped working probably after an update on mine, and I didn't have time to fix it). Maybe this is the difference - I would suggest not to dig further.
Anything improving efficiency is good in my opinion.
Actually no, the token did not work. I think that the update to Monterey modified the way OSX saves the krb token. Spent 15 minutes trying to fix it and gave up. So our setup was equivalent.
However, I can imagine that keeping a tunnel open for 2 hours could cause an issue. We can both try to extend the timeout and to make the code more efficient. The former requires help from the DB manager. The latter I will try to figure out in the next few days (unless @RaulRPrado has a reason why it cannot be done).
Couldn't recreate, neither from the WGS nor from my laptop using the usual dev container. Running on the WGS took about 10 minutes, running on the laptop took significantly longer (1.5-2 hours). In neither case did I get this issue.
However, following your error message, I do see some strange behaviour in the code. If I understand correctly (@RaulRPrado can correct me if I am wrong), every time we run the ray tracing on one mirror, we read the model from the DB and export a config file. Not sure why we do that for every mirror. I assume the entire config file does not change but instead we change just one/two parameters in it. Why don't we export the file once and then, if necessary, edit the exported one prior to each run? Alternatively, we not export the model we already read once and have in memory (i.e., dict)?
I am pretty sure that reading from the DB for every run makes the execution much slower, especially when running from home. Should we open an issue to improve this behaviour?
That should be easy to fix. I will work on it.
I will create another issue for that, so you can close this one.
OK, I am closing this. The underlying issue of prolonging the timeout of the DB might still be there, but if we fix this issue we might never encounter it again and maybe increasing the timeout unnecessarily isn't a good idea.
derive_mirro_rnda is not working for me, see error below.
There is a fix this in that line in the code, so this is probably known?