carissalow / rapids

Reproducible Analysis Pipeline for Data Streams
http://www.rapids.science/
GNU Affero General Public License v3.0
37 stars 20 forks source link

Data cleaning #166

Closed Meng6 closed 3 years ago

Meng6 commented 3 years ago

Updated the code for data cleaning:

Need to do before merging:

JulioV commented 3 years ago

Thanks @Meng6 Can you check the tests, please? They are falling for activity recognition

Meng6 commented 3 years ago

Thank you @JulioV. Weiyu is working on the testing results currently. Will let you know once it is done.

Meng6 commented 3 years ago

Hi @JulioV, the testing results, example workflow, and docs were updated. Could you please double check when you are free? Thanks!

Meng6 commented 3 years ago

Hi @JulioV, we just found that there might be an issue when time segment is very small. I split the [DATA_YIELDED_HOURS_RATIO_THRESHOLD] parameter into two parameters: DATA_YIELD_UNIT and [DATA_YIELD_RATIO_THRESHOLD]. I also updated the docs as you suggested. Could you double check the latest docs? If it is good, I can merge it back to the develop branch! Thank you!

JulioV commented 3 years ago

Hi @JulioV, we just found that there might be an issue when time segment is very small. I split the [DATA_YIELDED_HOURS_RATIO_THRESHOLD] parameter into two parameters: DATA_YIELD_UNIT and [DATA_YIELD_RATIO_THRESHOLD]. I also updated the docs as you suggested. Could you double check the latest docs? If it is good, I can merge it back to the develop branch! Thank you!

Let's cheange DATA_YIELD_UNIT to DATA_YIELD_FEATURE and possible values to RATIO_VALID_YIELDED_HOURS & RATIO_VALID_YIELDED_MUNUTES

Other than this and the comment above, it's ready