Open EhsanM4t1qbit opened 3 years ago
Have you finished the fine-tune task you describe above? I'm also interested in it, and I have the same questions as you have. I'd appreciate it if you can share the way to solve them. : )
I am also interested in a similar task. Could you figure it out? I noticed that the context encoding size equals the number of input words + 1. Maybe that extract token is the CLS
Hi, thanks for this great work. I'm trying to use your package for a binary task, and I'd like to have your feedback about how I'm using it for my problem. The task is to determine if a given context belongs to a given table (binary 0/1). Following the example posted on the homepage, I am using the
enocde()
method to get the context and table embeddings, concatenate and average them, and then pass the average tensor through a linear layer followed by sigmoid and cross entropy. I have a few specific questions 1- Isencode
the right method for training, or is it meant to be used for inference only? 2-encode
receives lists of lists as context and tables. By inspecting the code, I can see that they are converted to Tensors internally. Does this mean I can't use a PyTorch Dataset in my training loop? Currently, I'm using lists. 3- I couldn't figure out how to access the output representation for theCLS
token. As mentioned, I am concatenating and averaging the table and context embeddings. Is there aCLS
token I can use instead? 4- Currently, I get CUDA out of memory error after 617 steps of batch size 8. I know this is caused by the batch of data and context inside my training loop. If I move them out of the loop and feed the model a single batch of training over and over in the loop, the problem goes away.Here is a snippet of my code that captures the process.
Your feedback is greatly appreciated.