ESGF / esg-publisher

ESGF Publisher
http://esg-publisher.readthedocs.org/
10 stars 22 forks source link

esgpublish prints a huge amount of log output #237

Open skremdwd opened 1 month ago

skremdwd commented 1 month ago

A normal import process of around 900 files created a total output/log file of around 700 MiB on disk.

It seems like many lines are repeated over and over with the same content, getting another line longer after every imported dataset.

This may be related to #170?

Log output

Part of the output to demonstrate what it looks like, it seems like it is reprinting some parts but I am not sure why it would do that, I can't find a related issue in the publog code.

esgpublish --no-replica --project climatepredictionsde --map ~/import-test/map/ --config ~/.esg/projects/esg.climatepredictionsde.yaml --no-auth &> ~/log.txt
[...]
Making dataset...
[New dataset gets imported]
Making dataset...
Making dataset...
[New dataset gets imported]
Making dataset...
Making dataset...
Making dataset...
[New dataset gets imported]
Making dataset...
Making dataset...
Making dataset...
Making dataset...
[...]
Part/start of the log file ```text 2024-07-12 08:05:01 INFO No Globus UUID defined. 2024-07-12 08:05:01 INFO No data transfer node defined. 2024-07-12 08:05:01 INFO Converting mapfile... 2024-07-12 08:05:01 INFO Running Extraction... 2024-07-12 08:05:01 INFO Making dataset... 2024-07-12 08:05:01 WARNING experiment does not agree! subseasonal != mfc20240708 2024-07-12 08:05:01 WARNING product does not agree! restricted != output 2024-07-12 08:05:01 INFO Updating... 2024-07-12 08:05:02 INFO [SUCCESS] Number of records updated: 1 2024-07-12 08:05:02 INFO [SUCCESS] Number of records updated: 1 2024-07-12 08:05:02 INFO INFO: Found previous version, updating the record: ClimatePredictionsDE.subseasonal.restricted.output.HYR-5.DWD.IFS24.mvh20240708.mfc20240708.r77i1p1.DWD-EPISODES2022.v1-r1.day.sfcWind.v20240709|esgf-data.dwd.de 2024-07-12 08:05:02 INFO Running index pub... 2024-07-12 08:05:02 INFO [SUCCESS] Published record: ClimatePredictionsDE.subseasonal.restricted.output.HYR-5.DWD.IFS24.mvh20240708.mfc20240708.r77i1p1.DWD-EPISODES2022.v1-r1.day.sfcWind.v20240709.sfcWind_day_IFS24--DWD-EPISODES2022--HYR-5_mfc20240708_r77i1p1_20240708-20240822.nc|esgf-data.dwd.de 2024-07-12 08:05:02 INFO [SUCCESS] Published record: ClimatePredictionsDE.subseasonal.restricted.output.HYR-5.DWD.IFS24.mvh20240708.mfc20240708.r77i1p1.DWD-EPISODES2022.v1-r1.day.sfcWind.v20240709.sfcWind_day_IFS24--DWD-EPISODES2022--HYR-5_mfc20240708_r77i1p1_20240708-20240822.nc|esgf-data.dwd.de 2024-07-12 08:05:02 INFO [SUCCESS] Published record: ClimatePredictionsDE.subseasonal.restricted.output.HYR-5.DWD.IFS24.mvh20240708.mfc20240708.r77i1p1.DWD-EPISODES2022.v1-r1.day.sfcWind.v20240709|esgf-data.dwd.de 2024-07-12 08:05:02 INFO [SUCCESS] Published record: ClimatePredictionsDE.subseasonal.restricted.output.HYR-5.DWD.IFS24.mvh20240708.mfc20240708.r77i1p1.DWD-EPISODES2022.v1-r1.day.sfcWind.v20240709|esgf-data.dwd.de 2024-07-12 08:05:02 INFO Done. Cleaning up. 2024-07-12 08:05:02 INFO No Globus UUID defined. 2024-07-12 08:05:02 INFO No data transfer node defined. 2024-07-12 08:05:02 INFO Converting mapfile... 2024-07-12 08:05:02 INFO Converting mapfile... 2024-07-12 08:05:02 INFO Running Extraction... 2024-07-12 08:05:02 INFO Running Extraction... 2024-07-12 08:05:02 INFO Making dataset... 2024-07-12 08:05:02 INFO Making dataset... 2024-07-12 08:05:02 WARNING experiment does not agree! subseasonal != mfc20240708 2024-07-12 08:05:02 WARNING experiment does not agree! subseasonal != mfc20240708 2024-07-12 08:05:02 WARNING product does not agree! restricted != output 2024-07-12 08:05:02 WARNING product does not agree! restricted != output 2024-07-12 08:05:02 INFO Updating... 2024-07-12 08:05:02 INFO Updating... 2024-07-12 08:05:02 INFO [SUCCESS] Number of records updated: 1 2024-07-12 08:05:02 INFO [SUCCESS] Number of records updated: 1 2024-07-12 08:05:02 INFO [SUCCESS] Number of records updated: 1 2024-07-12 08:05:02 INFO [SUCCESS] Number of records updated: 1 2024-07-12 08:05:02 INFO [SUCCESS] Number of records updated: 1 2024-07-12 08:05:02 INFO [SUCCESS] Number of records updated: 1 2024-07-12 08:05:02 INFO INFO: Found previous version, updating the record: ClimatePredictionsDE.subseasonal.restricted.output.HYR-5.DWD.IFS24.mvh20240708.mfc20240708.r4i1p1.DWD-EPISODES2022.v1-r1.day.rsds.v20240709|esgf-data.dwd.de 2024-07-12 08:05:02 INFO INFO: Found previous version, updating the record: ClimatePredictionsDE.subseasonal.restricted.output.HYR-5.DWD.IFS24.mvh20240708.mfc20240708.r4i1p1.DWD-EPISODES2022.v1-r1.day.rsds.v20240709|esgf-data.dwd.de 2024-07-12 08:05:02 INFO Running index pub... 2024-07-12 08:05:02 INFO Running index pub... 2024-07-12 08:05:02 INFO [SUCCESS] Published record: ClimatePredictionsDE.subseasonal.restricted.output.HYR-5.DWD.IFS24.mvh20240708.mfc20240708.r4i1p1.DWD-EPISODES2022.v1-r1.day.rsds.v20240709.rsds_day_IFS24--DWD-EPISODES2022--HYR-5_mfc20240708_r4i1p1_20240708-20240822.nc|esgf-data.dwd.de 2024-07-12 08:05:02 INFO [SUCCESS] Published record: ClimatePredictionsDE.subseasonal.restricted.output.HYR-5.DWD.IFS24.mvh20240708.mfc20240708.r4i1p1.DWD-EPISODES2022.v1-r1.day.rsds.v20240709.rsds_day_IFS24--DWD-EPISODES2022--HYR-5_mfc20240708_r4i1p1_20240708-20240822.nc|esgf-data.dwd.de 2024-07-12 08:05:02 INFO [SUCCESS] Published record: ClimatePredictionsDE.subseasonal.restricted.output.HYR-5.DWD.IFS24.mvh20240708.mfc20240708.r4i1p1.DWD-EPISODES2022.v1-r1.day.rsds.v20240709.rsds_day_IFS24--DWD-EPISODES2022--HYR-5_mfc20240708_r4i1p1_20240708-20240822.nc|esgf-data.dwd.de 2024-07-12 08:05:02 INFO [SUCCESS] Published record: ClimatePredictionsDE.subseasonal.restricted.output.HYR-5.DWD.IFS24.mvh20240708.mfc20240708.r4i1p1.DWD-EPISODES2022.v1-r1.day.rsds.v20240709.rsds_day_IFS24--DWD-EPISODES2022--HYR-5_mfc20240708_r4i1p1_20240708-20240822.nc|esgf-data.dwd.de 2024-07-12 08:05:02 INFO [SUCCESS] Published record: ClimatePredictionsDE.subseasonal.restricted.output.HYR-5.DWD.IFS24.mvh20240708.mfc20240708.r4i1p1.DWD-EPISODES2022.v1-r1.day.rsds.v20240709|esgf-data.dwd.de 2024-07-12 08:05:02 INFO [SUCCESS] Published record: ClimatePredictionsDE.subseasonal.restricted.output.HYR-5.DWD.IFS24.mvh20240708.mfc20240708.r4i1p1.DWD-EPISODES2022.v1-r1.day.rsds.v20240709|esgf-data.dwd.de 2024-07-12 08:05:02 INFO [SUCCESS] Published record: ClimatePredictionsDE.subseasonal.restricted.output.HYR-5.DWD.IFS24.mvh20240708.mfc20240708.r4i1p1.DWD-EPISODES2022.v1-r1.day.rsds.v20240709|esgf-data.dwd.de 2024-07-12 08:05:02 INFO [SUCCESS] Published record: ClimatePredictionsDE.subseasonal.restricted.output.HYR-5.DWD.IFS24.mvh20240708.mfc20240708.r4i1p1.DWD-EPISODES2022.v1-r1.day.rsds.v20240709|esgf-data.dwd.de 2024-07-12 08:05:02 INFO Done. Cleaning up. 2024-07-12 08:05:02 INFO Done. Cleaning up. 2024-07-12 08:05:02 INFO No Globus UUID defined. 2024-07-12 08:05:02 INFO No data transfer node defined. 2024-07-12 08:05:02 INFO Converting mapfile... 2024-07-12 08:05:02 INFO Converting mapfile... 2024-07-12 08:05:02 INFO Converting mapfile... 2024-07-12 08:05:02 INFO Running Extraction... 2024-07-12 08:05:02 INFO Running Extraction... 2024-07-12 08:05:02 INFO Running Extraction... 2024-07-12 08:05:02 INFO Making dataset... 2024-07-12 08:05:02 INFO Making dataset... 2024-07-12 08:05:02 INFO Making dataset... [...] ```
sashakames commented 1 month ago

I am able to reproduce the issue when running with --map <directory>/ two workarounds (1) use a shell loop to iterate through mapfiles and use --map <filename>.map (2) --silent should suppress output.
Issue under investigation...