FAIRiCUBE / FAIRiCUBE-Hub-issue-tracker

FAIRiCUBE HUB issue tracker
Creative Commons Zero v1.0 Universal
0 stars 1 forks source link

Record computational demands automatically #16

Closed jetschny closed 1 month ago

jetschny commented 1 year ago

see summary of deliverable D3.3

All use cases have now started to execute data driven processing techniques to answer their respective scientific questions and we have only started to document the computational resources that have been used. This collection will grow over time and will be a valuable catalogue to estimate the demands for similar tasks, give insights for further optimization and is essential input to weight computational costs with gained improvements. As for now, we have started to standardize the collection of numerical parameters as tables, and we will see if there will be additional parameters to be included in the future. Further focus will be on how to collect, manage and make available these parameters more systematically, automatically and transparently. Ideally these parameters are collected as runtime variables while executing the code and piped into a metadata recording system.

Can we create a python function that is called at the beginning of a script and end and that is automatically record all essential parameters like execution time, memory consumption, etc.... see table 1 (and the others that follow)? output should be a csv file according to table 1

cozzolinoac11 commented 1 year ago

Hi @jetschny we are working on this issue and at the moment, with a very simple block of code to evaluate, the 'measurer' returns this table in CSV format. Do you have any ideas or comments?

Screenshot 2023-06-28 161846

In addition, we are working on making the calculation even more efficient and on including the section regarding costs. These however seem to depend very much on the platform, type of resource, geographical region, contract, etc. Is there any more information we can use?

KathiSchleidt commented 1 year ago

Cool! @jetschny is the expert, but a few things I noticed:

jetschny commented 1 year ago

that is a very good start, many thanks for this! @BachirNILU might have some time to test this out some UC examples. where is the code and how can this be called? regarding the measures the following measures

in a nutshell, this is really all good and we need to look into the actual implementation.... regarding cost, this might not be possibel as this going deep into the AWS world, but if you can get it at runtime (pricing, type of AWS compute instances) that would be awesome!

cozzolinoac11 commented 1 year ago

Clearly there was an error in 'Data size in grid points' (thanks to both for spotting this) and we are working on fixing it :smile: At the moment, these values are passed using a variable while a more automatic solution is investigated.

The value of CO2 consumed is calculated using CodeCarbon's EmissionsTracker (its online version automatically detects the location of execution). For the main memory usage measurement, tracemalloc is used. While, for other metrics, psutil.

A current version of the code has been uploaded into the common-code repository under record-computational-demands-automatically. This version is not yet final and we are working on improving and if necessary correcting it. Any ideas or comments welcome

BachirNILU commented 10 months ago

@cozzolinoac11 thank you for this interesting code, it will be helpful for us. I started testing the code (following the example you provide), but I keep having the following error: NotADirectoryError: [WinError 267] The directory name is invalid: 'C' Am I missing something? Thanks in advance,

-Bachir.

cozzolinoac11 commented 10 months ago

Hi @BachirNILU I just made a small update to the code. Could you please try to run it again and let me know which command gives the error. Thanks! :smile:

BachirNILU commented 10 months ago

Hi,

Thanks @cozzolinoac11. Looks like @mari-s4e already succeeded to run the code (after correcting minor typos). She will also shortly propose some improvements. Thanks again for this work.

Best regards, -Bachir.

mari-s4e commented 10 months ago

Hi, yes the code works for me (the bug I found was the same you @cozzolinoac11 found ;) ) I opened a PR to add logging functionality as an option (I like to log in-between steps of the processes so that I can trace back later what was going on and when) . Also opened an issue because the Essential libraries part does not work as I expected. Happy to contribute further!

cozzolinoac11 commented 10 months ago

Hi,

Thanks for the ideas and input 😄

cozzolinoac11 commented 10 months ago

Hi @mari-s4e @BachirNILU I just made some changes to the measurer and the example code to solve the problem of missing libraries. Any ideas or comments welcome!

jetschny commented 1 month ago

I consider this as complete, no open issues, can be re-opened if needed