GoldenCheetah / OpenData

A project to collect, collate and share an open data set with contributions from users of the GoldenCheetah application
38 stars 6 forks source link

Power and HR metrics in JSON file #25

Closed IvanLiuTW closed 1 year ago

IvanLiuTW commented 1 year ago

Hi all,

I am a data engineer at Giant Group, and now I am working with sports science team to do some research about cycling.

Before I ask my question, I'd like to thank you for your work and your contribution to this open data project. I also appreciate that all of the participants joining this project and their willingness to share their data. The data is very valueable to me. Thank you. 🙂

For each athlete, they all have a JSON file, and there are metrics as follows,

 'average_hr': ['112.11171', '7045.00000'],
 'average_ct': ['37.62149', '7045.00000'],
 'heartbeats': '13163.78335',
 'average_cad': ['46.85570', '4366.00000'],
 'average_temp': ['18.43206', '7043.00000'],
 'max_heartrate': '157.00000',
 'min_heartrate': '82.00000',
 'max_ct': '37.88746',
 'max_speed': '50.78880',
 'max_cadence': '107.00000',
 'max_temp': '21.00000',
 'min_temp': '17.00000',
 'ninety_five_percent_hr': '142.00000',
 'vam': '83.55111',
 'gradient': '0.50495',
 'total_kcalories': '890.76413',
 'activity_crc': '1361582490897.00000',
 'cp_setting': '211.00000',
 'cpsolver_best_r': '-255.00000',
 'time_in_zone_H1': '1347.00000',
 'time_in_zone_H2': '116.00000',
 'percent_in_zone_H1': ['17.74236', '7592.00000'],
 'percent_in_zone_H2': ['1.52792', '7592.00000'],
 'time_in_zone_P1': '254.00000',
 'time_in_zone_P2': '224.00000',
 'time_in_zone_P3': '266.00000',
 'time_in_zone_P4': '375.00000',
 'time_in_zone_P5': '4859.00000',
 'percent_in_zone_P1': ['3.34563', '7592.00000'],
 'percent_in_zone_P2': ['2.95047', '7592.00000'],
 'percent_in_zone_P3': ['3.50369', '7592.00000'],
 'percent_in_zone_P4': ['4.93941', '7592.00000'],
 'percent_in_zone_P5': ['64.00158', '7592.00000'],
 'best_50m': '0.06667',
 'best_100m': '0.13333',
 'best_200m': '0.25000',
 'best_400m': '0.51667',
 'best_500m': '0.66667',
 'best_800m': '1.23333',
 'best_1000m': '1.68333',
 'best_1500m': '2.75000',
 'best_2000m': '4.18333',
 'best_3000m': '7.13333',
 'best_4000m': '9.30000',
 'best_5000m': '11.45000',
 'best_10km': '27.70000',
 'best_15km': '42.18333',
 'best_20km': '57.71667',
 'best_half_marathon': '61.01667',
 'best_30km': '104.20000',
 '1m_critical_power_hr': '105.66102',
 '5m_critical_power_hr': '117.78595',
 '10m_critical_power_hr': '123.15902',
 '20m_critical_power_hr': '121.68988',
 '30m_critical_power_hr': '121.15335',
 '60m_critical_power_hr': '120.85567',
 '1m_peak_hr': '152.52500',
 '2m_peak_hr': '146.14167',
 '3m_peak_hr': '144.13704',
 '5m_peak_hr': '137.04133',
 '8m_peak_hr': '127.97396',
 '10m_peak_hr': '127.73917',
 '20m_peak_hr': '122.53867',
 '30m_peak_hr': '121.54956',
 '60m_peak_hr': '116.19792',
 '90m_peak_hr': '110.95799',

This is from one of the athlete. It's really exciting to see these metrics, but I am wondering if there is any documentation that states how some of these metrics (critical_power_hr, gradient, for instance) are computed and the meaning of some abbreviations.

On the other hand, I also observed some of the csv files are incomplete, maybe the power or heart rate values were missing. I assumed that this is the reason why the METRICS in JSON were a bit different from each activity. Is it correct?

Thanks again.

gregtkogut commented 1 year ago

For context I'm just a data hobbyist who's played with some of this data, so my answer should not be considered in any way authoritative.

The best documentation I've seen is the Golden Cheetah source code itself. Once you identify where, for example, critical power is computed from raw power samples, it's not hard to figure out. Fortunately the math is pretty simple. That said, it takes some coding knowledge to be able to sift through the code and isolate just the chunks you're looking for.

This can also be useful for deciphering abbreviations.

There is not, to my knowledge, any detailed whitepaper or UML-like system model or other model maintained to communicate the mathematical basis for all the computation in Golden Cheetah. This is not unexpected for an open source project where precious hours of labor are often best spent producing source code.

Good luck getting a response from the Golden Cheetah maintainers. I created some pull requests here, but deleted them after a year of complete non-response. And I see a couple of others have also not gotten a response.

You might find the data useful for some purpose a good amount of filtering work is needed to curate a candidate dataset for any given purpose. As you've started to discover, there's a lot of variation and inconsistency across the data.

I'm not criticizing either the non-response or the data quality...I recognize and value all attempts to create open, transparent data, and expect that exactly no one is obligated to assist. Just reporting my findings.

amtriathlon commented 1 year ago

This is from one of the athlete. It's really exciting to see these metrics, but I am wondering if there is any documentation that states how some of these metrics (critical_power_hr, gradient, for instance) are computed and the meaning of some abbreviations.

You need to look at GoldenCheetah documentation in https://github.com/GoldenCheetah/GoldenCheetah/wiki, Glossary is a good starting point for abbreviations and metrics.

IvanLiuTW commented 1 year ago

Hi @amtriathlon and @gregtkogut

Got it, I see. I will go through the websites you provided. I truly appreciate your information and reply.

Thank you!