European-XFEL / DAMNIT

Data And Metadata iNspection Interactive Thing
https://damnit.rtfd.io
BSD 3-Clause "New" or "Revised" License
6 stars 2 forks source link

Logs miss-displaying processing of SLURM variables #115

Open matheuscteo opened 1 year ago

matheuscteo commented 1 year ago

CC @bermudei

Related to https://github.com/European-XFEL/DAMNIT/issues/100 , when e.g. reprocessing --match (slurm variable) the logs do not indicate that the job started to run but instead display that no match was found.

This might lead to the user running several jobs by mistake.

takluyver commented 1 year ago

This is what I see when doing this:

$ amore-proto reprocess --match zimuth 148
2023-10-26 16:11:25 INFO     extra_data.read_machinery              Found proposal dir '/gpfs/exfel/exp/FXE/202302/p004507' in 0.038 s
INFO:extra_data.read_machinery:Found proposal dir '/gpfs/exfel/exp/FXE/202302/p004507' in 0.0017 s
INFO:__main__:Using 0 variables (of 22) from context file
INFO:extra_data.read_machinery:Found proposal dir '/gpfs/exfel/exp/FXE/202302/p004507' in 0.0016 s
INFO:extra_data.read_machinery:Found proposal dir '/gpfs/exfel/exp/FXE/202302/p004507' in 0.0018 s
INFO:__main__:Writing 1 variables to 2 datasets in extracted_data/p4507_r148.h5
INFO:__main__:Writing 1 variables to 1 datasets in /tmp/tmpd02jlmxi/reduced.h5
2023-10-26 16:11:59 INFO     damnit.backend.extract_data            Reduced data has 1 fields
2023-10-26 16:12:00 INFO     damnit.backend.extract_data            Adding p4507 r148 to database, with 1 columns
2023-10-26 16:12:00 INFO     damnit.backend.extract_data            Sent Kafka update to topic 'amore-db-1959f8c5f91fae45c425b1dbb19d42c0e91d36b0'
2023-10-26 16:12:00 INFO     damnit.backend.extract_data            Launched Slurm job 3855048
 to calculate cluster variables

That doesn't seem so bad? Admittedly it says 'Using 0 variables', but the last line has 'Launched Slurm job'.

JamesWrigley commented 1 year ago

I think the problem is that it doesn't specify that it's selecting 0 variables to process locally, and there might be more used in the slurm job. We could probably improve the logs in general by printing the names of all the variables that were selected, which would be particularly useful with --match.

matheuscteo commented 1 year ago

Yes, the phrase 'Using 0 variables' was what generated the confusion today.

takluyver commented 1 year ago

OK, got it. I'm hoping to work on logs today (if other stuff doesn't get in the way, as it often does)

takluyver commented 1 year ago

This should now show up as:

INFO:__main__:Using 0 variables (of 22) from context file (cluster variables will be processed later)

Are we happy enough with that message to close this issue?

bermudei commented 1 year ago

As James commented, is it possible to print the names of the variables being used? both locally and in slurm. I know it might be too much specially when "all variables" are being processed but I think it would be very useful.