ThilinaRajapakse / pytorch-transformers-classification

Based on the Pytorch-Transformers library by HuggingFace. To be used as a starting point for employing Transformer models in text classification tasks. Contains code to easily train BERT, XLNet, RoBERTa, and XLM models for text classification.
Apache License 2.0
306 stars 97 forks source link

can I have more column in train set #43

Open shainaraza opened 4 years ago

shainaraza commented 4 years ago

other than the specified format as below, can I have more columns as features? guid: An ID for the row. label: The label for the row (should be an int). alpha: A column of the same letter for all rows. Not used in classification but still expected by the DataProcessor. text: The sentence or sequence of text.

ThilinaRajapakse commented 4 years ago

Not without creating your own model class. Transformer models only accept a sequence of text as its input.

shainaraza commented 4 years ago

thanks you very much for yours reply. can I make some change here def init(self, input_ids, input_mask, segment_ids, label_id): self.input_ids = input_ids self.input_mask = input_mask self.segment_ids = segment_ids self.label_id = label_id

ThilinaRajapakse commented 4 years ago

I'm not sure where that piece of code is from. Essentially, you'll need to edit the BertForSequenceClassification class in the transformers library so that it can accept additional inputs. You'll also need to write the forward() function to handle the inputs.

shainaraza commented 4 years ago

Thanks you ThilinaRajapakse for yours great work and timely responses, I am using this library and definitely acknowledge and refer you in my coming work, all the best

ThilinaRajapakse commented 4 years ago

No problem!

Take a look at Simple Transformers as well. You may find it easier to work with compared to this repo.

shainaraza commented 4 years ago

yes I am using simple transformers too, its super easy to use. I am currently using google colab. sometimes I get error "RuntimeError: CUDA error: device-side assert triggered". Which cloud services for GPU do you suggest, my dataset is like 2GB. thanks in advance

ThilinaRajapakse commented 4 years ago

That error normally happens when you have bad data in your dataset (invalid labels, special characters, etc.)

I don't use cloud GPUs so I'm afraid I can't really recommend any.

shainaraza commented 4 years ago

thanks ThilinaRajapakse for yours timely response once again, I agree with you about data, one last question for today, can I run same simple transformers on CPU, i mean you tested and built all these models, did you use some GPUs or just CPU

ThilinaRajapakse commented 4 years ago

You can run them on either. However, running on CPU will be far too slow for it to be practical. I always train using a GPU.

shainaraza commented 4 years ago

thanks, best to you