-
After downloading the test dataset from the official website, there is actually 22G of the data.
Is the 1204 images selected for the validation set specific? Is that randomly chosen or others?
…
-
Many of the splitters are poorly described, in terms of formalism, and in terms of parameters. We should write proper formal descriptions of the splitters.
For a complete description, the docstring…
-
### Training Pipeline Script
**Code Location:** /bg_control/0_meal_identification/meal_identification/meal_identification/modeling/train.py
**Requirements:**
1. A model instantiation function tha…
-
import streamlit as st
import pandas as pd
import numpy as np
import pennylane as qml
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
fro…
-
There are multiple ratios that can be chosen to split the data in training and test set. I would like to purpose a feature/algorithm that will assess the data in its entirety and give back/suggest use…
-
Given a folder `data/raw/...` where ... = the downloaded v2d datasets, we need a script to partition (move) them into subdirectories called `train`, `val`, and `test`.
This is because the 4M folder…
-
Hi! Very interesting work! But I think you should disable shuffle when splitting data.
**Train_test_split** shuffles data by default, you can inform **shuffle=false** to avoid future data context lea…
-
I have a time series dataframe containing 80 features, 29922 rows, and the idea was to first use MINIROCKET for feature creation, and then use a linear regressor to reconstruct a target column. I'm d…
-
#### Describe the bug
It appears since 0.23.0 using multiple columns in train_test_split's stratify option results in an error if one column is the pandas nullable int `Int32Dtype()` type. Er…
-
Helo guys, I noticed that in the continued pretraining colab for korean language the function formatting_prompts_func is not used to map the dataset of wikipedia:
```
def formatting_prompts_func(e…
x1250 updated
2 weeks ago