Closed jetschny closed 1 month ago
Hi @jetschny we are working on this issue and at the moment, with a very simple block of code to evaluate, the 'measurer' returns this table in CSV format. Do you have any ideas or comments?
In addition, we are working on making the calculation even more efficient and on including the section regarding costs. These however seem to depend very much on the platform, type of resource, geographical region, contract, etc. Is there any more information we can use?
Cool! @jetschny is the expert, but a few things I noticed:
that is a very good start, many thanks for this! @BachirNILU might have some time to test this out some UC examples. where is the code and how can this be called? regarding the measures the following measures
in a nutshell, this is really all good and we need to look into the actual implementation.... regarding cost, this might not be possibel as this going deep into the AWS world, but if you can get it at runtime (pricing, type of AWS compute instances) that would be awesome!
Clearly there was an error in 'Data size in grid points' (thanks to both for spotting this) and we are working on fixing it :smile: At the moment, these values are passed using a variable while a more automatic solution is investigated.
The value of CO2 consumed is calculated using CodeCarbon's EmissionsTracker (its online version automatically detects the location of execution). For the main memory usage measurement, tracemalloc is used. While, for other metrics, psutil.
A current version of the code has been uploaded into the common-code repository under record-computational-demands-automatically. This version is not yet final and we are working on improving and if necessary correcting it. Any ideas or comments welcome
@cozzolinoac11 thank you for this interesting code, it will be helpful for us. I started testing the code (following the example you provide), but I keep having the following error: NotADirectoryError: [WinError 267] The directory name is invalid: 'C' Am I missing something? Thanks in advance,
-Bachir.
Hi @BachirNILU I just made a small update to the code. Could you please try to run it again and let me know which command gives the error. Thanks! :smile:
Hi,
Thanks @cozzolinoac11. Looks like @mari-s4e already succeeded to run the code (after correcting minor typos). She will also shortly propose some improvements. Thanks again for this work.
Best regards, -Bachir.
Hi, yes the code works for me (the bug I found was the same you @cozzolinoac11 found ;) )
I opened a PR to add logging functionality as an option (I like to log in-between steps of the processes so that I can trace back later what was going on and when) .
Also opened an issue because the Essential libraries
part does not work as I expected.
Happy to contribute further!
Hi,
Thanks for the ideas and input 😄
Hi @mari-s4e @BachirNILU I just made some changes to the measurer and the example code to solve the problem of missing libraries. Any ideas or comments welcome!
I consider this as complete, no open issues, can be re-opened if needed
see summary of deliverable D3.3
All use cases have now started to execute data driven processing techniques to answer their respective scientific questions and we have only started to document the computational resources that have been used. This collection will grow over time and will be a valuable catalogue to estimate the demands for similar tasks, give insights for further optimization and is essential input to weight computational costs with gained improvements. As for now, we have started to standardize the collection of numerical parameters as tables, and we will see if there will be additional parameters to be included in the future. Further focus will be on how to collect, manage and make available these parameters more systematically, automatically and transparently. Ideally these parameters are collected as runtime variables while executing the code and piped into a metadata recording system.
Can we create a python function that is called at the beginning of a script and end and that is automatically record all essential parameters like execution time, memory consumption, etc.... see table 1 (and the others that follow)? output should be a csv file according to table 1