01-edu / public

📚 @01-edu's Public Repository
http://public.01-edu.org/
202 stars 429 forks source link

sp500-strategies subject #2484

Closed jarmo-seljamaa closed 4 months ago

jarmo-seljamaa commented 4 months ago

sp500-strategies

As with other tasks in AI module, the instructions and audit questions lack proof-reading for spelling and clarity.

DataFrame with a Machine learning metrics on train et validation sets on all folds of the train set.

Train et validation?

From the audit:

Does the last validation set of the train set not overlap on the test set?

Possible answers:

Audit has only singular Yes/No options, negation questions are silly to include.

Also this one:

Do all of the folds not contain data from the same day? The split should be done on the dates.

Apart from the language problem (negation question + "do all not contain" can mean that some may contain?), it seems illogical. The data we are using is for 500 different stocks, so we do have 500 data points from the same day. It seems no problem to include the data from the same day in the folds, because the data is from different tickers. Perhaps there is something here, we're missing. If so, please do explain.

nprimo commented 4 months ago

Hi @jarmo-seljamaa, thank you for your feedback! I rephrased the 2 audit questions you pointed out to improve their clarity.

jarmo-seljamaa commented 4 months ago

Thanks! However the new wording still contains negation:

Is the last validation set of the train data not overlapping with the test data?

Possible answers:

nprimo commented 4 months ago

Would the following be clearer?

Is the last validation set of the train data not overlapping with the test data? Yes (the two sets are not overlapping) or no (the two sets are overlapping).

jarmo-seljamaa commented 4 months ago

I've seen this type of solution in some earlier tasks:

Can you confirm that the last validation set of the train data is not overlapping with the test data?

Not the smoothest, but at least it doesn't have 3 possible answers.