GlueDataset & GlueDataTrainingArguments - working on Colab

LedaguenelArthur commented 3 years ago

Hi everyone,

I'm currently trying to understand how to plugin my own data in the BioBERT model in PyTorch for relation extraction, and I see that the run_re.py script is using two utils for which I can't find documentation : GlueDataset & GlueDataTrainingArguments.

Does anyone know how to find the documentation for both these functions ? Or does anyone can briefly explain how they work ?

Thanks a lot, Best regards, Arthur Ledaguenel

wonjininfo commented 3 years ago

Hi Arthur!

Our repository is based on Transformers v3.0.0. Unfortunately, GlueDataset and GlueDataTrainingArguments are deleted in the current version of Transformers, v4. However, you can find those two utils from here ; Tag v3.0.0

Thank you for your interest in our work! Best, WonJin

LedaguenelArthur commented 3 years ago

Thank you very much for that quick answer !

I could not find any documentation on any of those two components on https://huggingface.co/transformers/v3.0.2/index.html for some reason... :/

I would like to plug in my own dataset into the neural network and I wondered how it could work with these two components ?

Would that be enough if I transform my data in the same format as GAD and split them into train.tsv, test.tsv and dev.tsv and feed it to the GlueDataset processor ?

Besides, I am working on Google Colab and thus facing problem when using the HfArgumentParser since i'm not executing the script from a shell passing the argument in the way :

can I set up the sys.argv by hand to stick with your code ?
I have tried to get arround this by setting up the dataclass by hand :

model_args = ModelArguments(config_name = "bert-base-cased", model_name_or_path = "dmis-lab/biobert-base-cased-v1.1") data_args = DataTrainingArguments(task_name = "SST-2", data_dir=DATA_DIR, max_seq_length=MAX_LENGTH, overwrite_cache=False) training_args = TrainingArguments(per_device_train_batch_size = BATCH_SIZE, save_steps = SAVE_STEPS, seed = SEED, do_train = True, do_predict = True, learning_rate = 5e-5, output_dir = OUTPUT_DIR, overwrite_output_dir = True)

but I get the following error :

[Errno 2] No such file or directory: './GAD/1/cached_train_BertTokenizer_128_sst-2.lock'

Is there a solution for that ? What am I doing wrong ?

I stay at your disposal if my question is unclear, Thank you again, Best regards, Arthur Ledaguenel

clouwer commented 2 years ago

Hi, Arthur

I got the same error: No such file or directory: '../data/RE/euadr/1/cached_train_BertTokenizerFast_128_sst-2.lock'

Is there a solution?

Thank you, Best regards, Clouwer

Bharathi-A-7 commented 2 years ago

@clouwer I face the same error. Did you happen to find a solution?

dmis-lab / biobert-pytorch

GlueDataset & GlueDataTrainingArguments - working on Colab #11