Closed sdspieg closed 1 year ago
Could you give an example of the code you're running and the error you're getting? What are you doing to load the JSON or CSV?
I made a colab demonstrating how to use the library here: .
This colab is just a copy/paste of the contents of the docs page for this repo though, but you can see it does work and it is easy to use.
It sounds like your issue is that you're struggling to parse CSV or JSON with Python, which is out of the scope of this library. There are a lot of resources online about how to parse JSON or CSV with Python, so I won't try to diagnose here without seeing your specific code. Whatever you want to parse using frame-semantic-transformer, you just need to get it into a Python string, and then pass that string into this library.
I'm going to close this issue now. If you continue to have problems, you can open a new issue. However, please recognize the following:
I am sorry you took offense at my title. It really just referred to the fact that I am not a CS person, and I am deeply in awe of the NLP-pros. But so here is my (python) code - and I did manage to (sort of) get it to work now:
import pandas as pd
from frame_semantic_transformer import FrameSemanticTransformer
from tqdm import tqdm
# Limit the DataFrame to the first 10 rows
df = df.head(10)
# Initialize the frame transformer
frame_transformer = FrameSemanticTransformer()
# Loop through each sentence in the DataFrame and display a progress bar
for sentence in tqdm(df['resolved_text_lemmatized']):
result = frame_transformer.detect_frames(sentence)
print(f"Results found in: {result.sentence}")
for frame in result.frames:
print(f"FRAME: {frame.name}")
for element in frame.frame_elements:
print(f" {element.name}: {element.text}")
Ok - having played around with it a bit more, I can confirm that this is a) really useable even for non-experts (like myself - at least with the infinitely patient assistance of chatGPT); and b) quite impressive. And that's just using the default settings.
Now I'm not sure whether it's ok to ask questions like this in a 'closed' topic but I'll run the risk. I have a dataframe of 167,413 sentences to which I applied (so far just intrasentential) coreference resolution in a resolved_text column. That's the column I fed into FrameSemanticTransformer. I let the code run on the first 1000 sentences, which took a little over 4 hours but really did generate very useful insights. But so running the code on my entire corpus would - by my calculations - run almost 5 years...
So, in one word: help? I could possibly run this on a machine with a Ryzen AMD Ryzen 7 3700X CPU and 96GB of RAM as well as an Nvidia GeForce RTX™ 4090 GPU with 24G - would this code work on that? And how would I make sure that my code would squeeze as much our of that system as possible? [And again - please forgive my clumsiness in expressing myself in the right terms, but I AM trying and I would genuinely love to get this to work!].
Your pandas code looks like it should work. 1000 sentences in 4 hours is extremely slow. Are you running this on a GPU or a CPU? If it's CPU I could imagine it might be that slow, but any decent Nvidia GPU with CUDA should be much much faster than that. If you're running on a 4090 that should easily handle this task with no problems, that's a super powerful GPU.
I also realize there's not a method currently to batch process multiple sentences in parallel which would help speed things up further on a GPU. I'll work on adding support for that in the meantime as well.
Great! I have just purchased a 4090 (these GPU prices are just nuts!), but have also just realized that it won't fit in my computer. So I'll have to get myself another case tomorrow too :)
But so the code I have will automatically use CUDA? Or do I have to add something?
At any rate - thanks much for that response and we're looking forward to batch processing as well, because we have some corpora that are quite a bit bigger than that...
You'll need to make sure you have CUDA and CUDNN installed, and the correct PyTorch version installed for your version of CUDA. I haven't ever done this before, so you'll need to figure out how to do that. There should be lots of tutorials online though. If you do that then this library should automatically detect your GPU and use it.
v0.6.0 adds a detect_frames_bulk()
method which you can use to do batch processing of sentences. There's also a batch_size
param you can pass when creating a FrameSemanticTransformer
instance to determine how much is processed per batch. If you have a GPU setup, you can try experimenting with the batch size to maximize the amount of sentence the GPU can process in parallel before it runs out of memory. There's also a use_gpu
param you can pass if you want to force it to use a GPU, or throw an error if it can't find a GPU. For example:
parser = FrameSemanticTransformer(use_gpu=True, batch_size=16)
sentences = [
"This is the first sentence.",
"This is the next sentence.",
...
]
results = parser.detect_frames_bulk(sentences)
This looks quite fascinating, but when I try to run it on some of our corpora (as opposed to just some sentences through your demo), it always throws errors that neither your documentation nor chatGPT can (so far) solve. Therefore: could you share some Jupyter Notebook or Google Colab that would take some text field/column from a json or csv-file as an input and would generate semantic frames? Thanks!