FoldingAtHome / fah-issues

49 stars 9 forks source link

Show FLOPs per work unit or time spent per work unit (or a users' cumulative total instead of per work unit) #1554

Open MyGithubNotYours opened 4 years ago

MyGithubNotYours commented 4 years ago

Context

I called the IRS today. They told me that my electricity could be considered a non-cash donation, but that I'd need records of my costs.

I figure that I can calculate electricity costs if I know at least 1 of the following:

As far as I can tell, GPU usage seems to be 0% or 100% so a conversion from time to costs will work - for example, my 2060 RTX seems to always be at ~145 Watts at 99% usage. However, multi-core CPU usage might be too complicated & intermittent to convert from time to costs?


Describe the Feature

Show the FLOPs, GPU time spent, or whatever metric can be converted into electricity costs in one of these ways (in order of ease & quick development time):

Show the metrics in either the log or the client console

show the metrics in the certificate (the one that says "this certifies that xxxx has folded yy workunits") or somewhere on the F@H servers so users can't forge fake numbers

autofill the IRS non-cash charitable donation form and make it accessible to users in one of these ways (or a better way):

  • email the auto-filled form to users
  • let users download the auto-filled form from the users' stats.foldingathome.org/donor/{donorname} webpage
  • let users download the auto-filled form from the console
  • some other way???

Other required tasks

@jcoffland

PantherX commented 4 years ago

Hiya @MyGithubNotYours This is an interesting request. Out of curiosity, have you read this article which talks about FLOPS and F@H (https://foldingathome.org/support/faq/flops/)?

You can already calculate the time per WU based on when the FahCore starts and finishes assuming no interruptions while folding.

You might be able to calculate the cost using this method (https://www.reddit.com/r/EtherMining/comments/6lqb1l/how_to_calculate_gpu_electricity_cost/).

Also, if you look up your username (which is not unique) on the official stats (https://stats.foldingathome.org/donors), you can click on the WU and get a certificate. However, the certificate is only generated at certain milestone thus isn't being updated with ever single WU submitted.

UofM-MartinK commented 4 years ago

This topic is very interesting to me personally, since I am looking into metrics to estimate the Energy Efficiency of my folding rigs, and of folding in general (and solar tax credits etc, so also some interest on the IRS angle)

I see many possible avenues to estimate or calculate the energy used, but especially the correct ones might be too hard to explain to the IRS :)

One very simple approach could be, if you still have access to all your log files:

Archive those logfiles as the original records. One could write a script (I would volunteer) which extracts the time your GPU was actually folding from those log files. Then multiply that time with the power consumption you measured while your rig was folding.

Assuming you usually also have the CPU cores folding, just measure the power consumption under "full folding load" - no need to complicate the process, and I think it's a fair assumption that the CPU is usually always busy.

While those log files themselves are not really forge proof, they should suffice for the IRS. But a log file is also verifyable to a large extent, because each processed WU can be named, and verified that the WU was actually really processed using https://apps.foldingathome.org/wu - even more so if you folded with a username (but even anonymously that should be fine)

MyGithubNotYours commented 4 years ago

@UofM-MartinK

Archive those logfiles as the original records. One could write a script (I would volunteer) which extracts the time your GPU was actually folding from those log files. Then multiply that time with the power consumption you measured while your rig was folding.

Sounds nice. Let's see what the FAH developers say in case they know something we don't (since it's their code).

Assuming you usually also have the CPU cores folding, just measure the power consumption under "full folding load" - no need to complicate the process, and I think it's a fair assumption that the CPU is usually always busy.

I'm not sure if the IRS will like that assumption. For other deductions similar to this one, the IRS wants taxpayers to separate deductible usage from non-deductible usage. They don't want you taking advantage of their deductions. For me at least, my multi-core CPU is not always busy. Here's a picture from AnandTech of power usage vs active cores: image Also, sometimes I set my FAH to a limited number of threads so that I can still use my computer. So, for sure, I don't think assuming full load would be safe in that case.

While those log files themselves are not really forge proof, they should suffice for the IRS. But a log file is also verifyable to a large extent, because each processed WU can be named, and verified that the WU was actually really processed using https://apps.foldingathome.org/wu - even more so if you folded with a username (but even anonymously that should be fine)

That makes sense. But I'd like to take as few risks with the IRS as possible haha. Let's see what/if the FAH developers have any interest in making an official solution. If not, then I'll probably bug you for your scripts :p

PantherX commented 4 years ago

@MyGithubNotYours Considering that your system isn't dedicated to folding, I would either set my CPU to 12 or 16 and then simply forget about it. It will continue to fold while leaving enough CPU free for other tasks. Since CPU folding happens at the lowest priority by default, it should not interfere with most applications. Feel free to test that out and see what works best for you 😄

MyGithubNotYours commented 4 years ago

@PantherX

This is an interesting request. Out of curiosity, have you read this article which talks about FLOPS and F@H (https://foldingathome.org/support/faq/flops/)?

Yeah, I saw that page, but I didn't see any way to calculate FLOPs (vs FLOPS). Let me know if I missed it.

You can already calculate the time per WU based on when the FahCore starts and finishes assuming no interruptions while folding.

Sometimes I have many interruptions :p

You might be able to calculate the cost using this method (https://www.reddit.com/r/EtherMining/comments/6lqb1l/how_to_calculate_gpu_electricity_cost/).

Nice, thanks. If the FAH develops can't/don't want to create an official solution, then maybe I'll use that method you linked to.

Also, if you look up your username (which is not unique) on the official stats (https://stats.foldingathome.org/donors), you can click on the WU and get a certificate. However, the certificate is only generated at certain milestone thus isn't being updated with ever single WU submitted.

Yeah that's the certificate that I mentioned in my post. I figure that would be a nice place to display a user's total FLOPs (not FLOPS) or GPU time. The benefit being that the numbers would be less forgeable in the IRS' eyes, because the info is not coming from the user.

PantherX commented 4 years ago

@MyGithubNotYours It is my understanding that FLOPs and FLOPS are the same thing just differently written (https://en.wikipedia.org/wiki/FLOPS). If that's not the case, could you please provide a link that describes what FLOPs is to ensure everyone is on the same page 😄

UofM-MartinK commented 4 years ago

OP might want to express the plural of a FLOP (Floating Point Operation), i.e. just the absolute number of Floating-Point computations performed. It should be almost proportional to the sum of the "base credit" of all WUs computed (see https://foldingathome.org/support/faq/points/ and FAHClient "base credits" for the WU currently processed, basically the time needed on a reference system, which is generally proportional to the number of computations needed to perform the work) While this is one possible measure of work performed, there are various issues with converting that metric back to Energy:

A) Most importantly, the slower (e.g., clock speed) you perform a computation and the better the cooling of the system is, the less energy it needs - and that's just for a single system.

B) Also, the more modern the CPU or GPU (e.g. due to smaller transistor sizes, energy management, more sophisticated compute units, ...), the more energy efficient a single computation is usually performed.

If you say that calibrating your actual system is not a choice because it could be "gamed" and/or you would also get a tax credit for energy your system used to do something else, I basically see only one way out: An "official" energy value for the work performed would have to be determined. And because any work can be completed at arbitrary efficiency in general (see A) , base credit might not be sufficient. Now, there is of course a time limit associated with each WU - which makes both "base credit" for all completed WUs as well as "actual credit" including the quick-return-bonus (or "QRB"), if calculated per day also known as "Point-Per-Day" or PPD, a potential candidate for "normalizing" to energy.

But now a person with the latest greatest GPU would fold a lot more energy efficient than somebody with an older GPU or just a modern CPU - but there is no way F@h can officially acknowledge or testify which system you actually used to complete the work.

So one way out of this dilemma is "officially" using a very modern benchmark system, or comparing to the theoretical performance of the latest greatest GPU tech out there at the end of a tax year. This system/reference would have to be updated as needed as soon as a more energy efficient GPU comes out. This system's "PPD" can be calculated against the energy efficiency it is running at, which, due to the nature of the equation for the QRB, will have a clear optimum. Basically: PPD/Watt. (Note: that optimum will not be at peak performance, but will happen at an "underclocked" scenario where the GPU is perhaps only using 20-30% of it's peak wattage, but usually still delivering 70-80% of its computational performance per unit time)

That "optimal" PPD/watt can be converted to "F@h points/kWh", and thus any donor's "Energy donation at theoretical peak efficiency" can be calculated by the inverse of that. But since most folders will fold with systems at only of fraction of that Energy efficiency, they will also only be able to claim a small fraction of the energy cost they actually donated for folding in their tax return.

But: no user would be able to claim MORE Energy as tax credit than actually used, unless having access to a more energy efficient system than used as the reference system, which should be impossible.

In general, I see big pro's and con's with that approach:

PRO: It would immediately show every folder how "green" or "energy efficient" they are, so they could optimize in that direction.

CON: It would immediately show every folder how "green" or "energy efficient" they are, and all the overclocking and other fun to squeeze the maximum PPDs out of their hardware could not feel so good anymore.

It might also de-incentivize many CPU folders, many of whom wouldn't upgrade - so there is potential that over-emphasizing the energy efficiency could come at a big hit for the overall performance of the F@h supercomputer.

MyGithubNotYours commented 4 years ago

@jcoffland Any news or opinion on this?

jchodera commented 4 years ago

Hi all!

Disclaimer: This is in no way intended to be tax advice, and we are not qualified to comment on anything related to income taxes. We're looking at providing you this capability for informational purposes only.

It looks like the NVML API allows us to monitor the instantaneous GPU power usage during execution on NVIDIA GPUs. We could report the total power usage (e.g. in kWh) for each WU in the client log, which could be analyzed to compute how much power was used in a given interval. In principle, this would allow someone who knew their average electricity cost (in $/kWh) to compute how much was spent on just operating the GPU. (There may even be some way we can integrate these calculations into a web app if we can accumulate the statistics from the servers.)

Questions:

MyGithubNotYours commented 4 years ago

@jchodera Wow thanks for looking into this!

To answer your questions:

  1. Yes, I think that would be useful! It seems like exactly the kind of metric that I was seeking. How difficult/long do you estimate it will take to provide that in the client log? I'm not trying to rush you - I'm just curious about when this might be available to us and how much trouble I'm causing you guys :p

  2. I don't know, but if there's not an equivalent for AMD GPUs, will that affect your willingness to implement something like this? UPDATE: A quick search looks like there's an AMD equivalent to nvidia-smi called "rocm-smi" that shows power usage for AMD GPUs: https://github.com/RadeonOpenCompute/ROC-smi

  3. I don't know. Let me (or someone else) get back to you. UPDATE: Here's what I could find so far: https://developer.nrel.gov/docs/electricity/utility-rates-v3/ https://catalog.data.gov/dataset/electricity-data-average-retail-price-of-electricity-application-programming-interface-api

UofM-MartinK commented 4 years ago
* Would this be useful?

Yes - seems to be accurate within 5%, and updated frequently enough to yield OK numbers. Some field tests might be advisable, comparing with other power metrics and total system power, to see if those numbers make sense in the bigger picture. I could actively support those efforts testing beta clients/cores etc, especially on Linux, because I routinely use several power monitoring tools and trackers on some systems.

* Does anyone know of equivalent AMD and Intel GPU APIs?

Under Linux, I regularly monitor GPU power on AMD cards, as exposed to the kernel - not sure how that relates to the OpenCL API, but I could look into that, would be nice if it's the same on Windows and Linux. No experience with Intel GPUs.

I don't see an reliable way to estimate CPU power consumption, so presenting the actual work performed in some other form might still be a good idea? For example logging the "base credit" next to the "credit estimate"?

(Under the assumption that the "base credit" is the best estimate for actual computational work performed, can somebody from the FAH internal team comment on that?)

* Is there a standard API for looking up electricity costs as well (in case they fluctuate in time)?

Some APIs like those MyGithubNotYours provided exist, but I am not aware of anything global or a global standard (yet). The user would have to provide a link or database to convert kWh@time into EUR/$/Currency? Will do some digging if any standard for that exists...

jchodera commented 4 years ago

Yes, I think that would be useful! It seems like exactly the kind of metric that I was seeking. How difficult/long do you estimate it will take to provide that in the client log? I'm not trying to rush you - I'm just curious about when this might be available to us and how much trouble I'm causing you guys :p

It shouldn't be hard to include for NVIDIA cards in core22, but it may not make the core22 0.0.12 release. Should be easy to tackle right after that if we can't fit it into 0.0.12, though!

A quick search looks like there's an AMD equivalent to nvidia-smi called "rocm-smi" that shows power usage for AMD GPUs:

Great find! It looks like the sysfs interface may make it easy to query this for those with ROCm and the lib-sensors library: https://rocmdocs.amd.com/en/latest/ROCm_System_Managment/ROCm-System-Managment.html#sysfs-interface

Here's what I could find so far: https://developer.nrel.gov/docs/electricity/utility-rates-v3/ https://catalog.data.gov/dataset/electricity-data-average-retail-price-of-electricity-application-programming-interface-api

Awseome! Thanks for digging these up. This part might come later, but will still be neat.

bb30994 commented 3 years ago

I don't think the difference between GFLOPS and GFLOPs was ever clarified. The way I use them is: FLOPS means FLoating point OPerations per Second [a speed]. FLOPs means FLoating point OPeration(s). [Plural]. Others may use them differently.