-
**What happened?**:
I have a pipeline where a temporary folder for some of the processing and found out that this may cause data leakage between datums. The datum data leakage then caused a whole sle…
-
I've been reviewing the data preprocessing steps in `data/data_loader.py` and noticed that the entire dataset undergoes fitting and transformation before being split into training, validation, and tes…
-
Possible data leakage?
On the original dataset, there are several images from the same patient [see for example patient number 2](https://github.com/ieee8023/covid-chestxray-dataset/blob/master/metad…
-
We at RiskIQ are authorized representatives of JPMorgan Chase. It has come to our attention that the content located at the following URLs contains accidental data leakage that contains JPMC code/scri…
-
Hi authors, thanks for your excellent work and code!
I found a bug in the **base_dataset.py**, which may significantly affect the model's performance. In the 243-247 lines, when **phase** is "val",…
jlidw updated
6 months ago
-
http://www.idappcom.com/db/?9616
-
Thank you for providing these excellent datasets. I am currently using the "Vitamin and Supplements", "Beyaz Perde All Movies", and "Beyaz Perde Best Movies" datasets from this repository for a sentim…
-
Hi! Very interesting work! But I think you should disable shuffle when splitting data.
**Train_test_split** shuffles data by default, you can inform **shuffle=false** to avoid future data context lea…
-
do you by any chance still have the dataset split (train/val/test set) that was used to pretrain ProtT5 UniRef50? I am trying to investigate data leakage for down stream tasks.
-
### Is your feature request related to a problem? Please describe.
Currently, the same S3 bucket is used for both private data (file exports) and public assets (uploaded images). This approach has se…