Closed BruceKenas closed 4 months ago
Hi,
__file__
is a special variable. It gives the path of the __init__.py
file, i.e. the entry point of the meerqat
package, if you installed it. Did you install it following the instructions?
A quick fix would be to remove this line because you should not need it to reproduce the experiments (if you’re trying to run meerqat.data.loading map|passages
). Or set ROOT_PATH=/path/to/ViQuAE/meerqat/__init__.py
Dear PaulLerner,
Really thank for your advice, I'll try it and give you the information soon!
Bruce.
Vào 15:13, T.5, 6 Th4, 2023 PaulLerner @.***> đã viết:
Hi,
file is a special variable. It gives the path of the init.py file, i.e. the entry point of the meerqat package, if you installed it. Did you install it following the instructions https://paullerner.github.io/ViQuAE/#installation? A quick fix would be to remove this line because you should not need it to reproduce the experiments (if you’re trying to run meerqat.data.loading map|passages)
— Reply to this email directly, view it on GitHub https://github.com/PaulLerner/ViQuAE/issues/2#issuecomment-1498663545, or unsubscribe https://github.com/notifications/unsubscribe-auth/A6ZKKV4XMWSQGBW6VRLHPBDW7Z3KXANCNFSM6AAAAAAWU6QYFM . You are receiving this because you authored the thread.Message ID: @.***>
Dear PaulLerner,
Can this project (Experiment.rst) run with Google Colab by the instruction that you've shown?
Thank you.
Hi Bruce,
I’m not sure actually, can you save data to disk on Colab?
Bests,
Paul
Hi again,
So I made this minimal example, everything seems to be working fine. What is giving you trouble exactly? https://colab.research.google.com/drive/1oTrlVWTdy4-uBUH4X0J1Iyw5-jdjc60J?usp=sharing
Hi Paul,
I really appreciate your help, I'm studying the paper "Multimodal Inverse Cloze Task for Knowledge-based Visual Question Answering" and try to do the "Experiement.rst" with google collab as an exercise for this topic KVQAE at university.
I have problem in git clone for datasets, because google collab gives some error like:
"FileNotFoundError: Directory /content/ViQuAE/data/viquae_passages is neither a Dataset
directory nor a DatasetDict
directory."
The instructor gave me the task to reproduce the KVQAE code as a task to do and present it to him, so I'm trying to learn to perform it with Collab.
I'll check out the link you send me. Thank you so much.
Hi again,
Can you check some issues in this google colab, I also let this collab file can be fixed.
https://colab.research.google.com/drive/1_UBeP9Z4uoeGV5tX1VU95x-MdYW4Bw-r?usp=sharing
Many thanks.
Hey, sorry for the long answer (I don’t usually work on week-ends)
So there are many errors in your notebook, let’s try to solve them one-by-one.
For the images, I believed I fixed it https://colab.research.google.com/drive/1oTrlVWTdy4-uBUH4X0J1Iyw5-jdjc60J?usp=sharing
For the passages, try using spacy==2.2.4
as written in the requirements (https://github.com/PaulLerner/ViQuAE/blob/main/requirements.txt), same for every error of a dependency :)
But you can use the already pre-processed passages available here: https://huggingface.co/datasets/PaulLerner/viquae_passages
(I’ll try to work on a more pythonic interface on the future, feel free to open a PR :))
HI Paul,
I really appreciate your help, thank you very much. I will proceed to fix my errors according to your instructions and notify you if there is something wrong.
Hi again,
Can I ask for a demo for Google Colab for the KVQAE task, like we run the model and input the images and questions from the dataset then the output will be the answer from the KB.
Thank you so much if you can help.
Hi Bruce,
Do you mean with training the models or inference only? It will take some time in any case… You don't have any compute power other than Colab? Which part are you having trouble with?
Bests,
Paul
Yes, the training model with inference for question and answer with KB.
To be honest, the compute power that i only choose with my situation now is google collab. I'm sure can wait for this.
Thank you so much.
Vào 18:45, T.2, 17 Th4, 2023 PaulLerner @.***> đã viết:
Hi Bruce,
Do you mean with training the models or inference only? It will take some time in any case… You don't have any compute power other than Colab? Which part are you having trouble with?
Bests,
Paul
— Reply to this email directly, view it on GitHub https://github.com/PaulLerner/ViQuAE/issues/2#issuecomment-1511188473, or unsubscribe https://github.com/notifications/unsubscribe-auth/A6ZKKV2R4VJZZWPTTS4LFYLXBUUMBANCNFSM6AAAAAAWU6QYFM . You are receiving this because you authored the thread.Message ID: @.***>
Do you mean to reproduce the experiments of the SIGIR paper? Or the ECIR one? Or both? Do you mean to reproduce both the Information Retrieval and Reading Comprehension steps, or only of them? (e.g. I could provide the output of Information Retrieval if you want to focus on Reading Comprehension, or the image embeddings, or any step really…)
Have you tried to download all images of the Knowledge Base https://huggingface.co/datasets/PaulLerner/viquae_all_images ? I’m afraid that it would exceed you Google Colab quota…
I mean the SIGIR paper, but the ECIR one i'll try later with your instruction in Experiment.rst
Yes, I want both the Information Retrieval and Reading Comprehension steps.
Thanks alot.
Vào 19:11, T.2, 17 Th4, 2023 PaulLerner @.***> đã viết:
Do you mean to reproduce the experiments of the SIGIR paper https://hal-cea.archives-ouvertes.fr/hal-03650618/? Or the ECIR one https://arxiv.org/abs/2301.04366? Or both? Do you mean to reproduce both the Information Retrieval and Reading Comprehension steps, or only of them? (e.g. I could provide the output of Information Retrieval if you want to focus on Reading Comprehension, or the image embeddings, or any step really…)
— Reply to this email directly, view it on GitHub https://github.com/PaulLerner/ViQuAE/issues/2#issuecomment-1511224004, or unsubscribe https://github.com/notifications/unsubscribe-auth/A6ZKKV3CSJUKL3D5MQUMYULXBUXN7ANCNFSM6AAAAAAWU6QYFM . You are receiving this because you authored the thread.Message ID: @.***>
I've tried to dowanload all image but i exceeded The Google Colab quota. So i only tried it only with: git clone https://huggingface.co/datasets/PaulLerner/viquae_images
I feared so… https://huggingface.co/datasets/PaulLerner/viquae_images is only the images of the dataset, which contextualizes the visual questions, not the images of the Knowledge Base… So you won’t be able to reproduce the entire information retrieval experiments. As a workaround, I can maybe provide already embedded images (with ImageNet-ResNet, ArcFace…), if you have enough space. What is your disk quota? If it is not enough, I can provide the results of the Information Retrieval so you will be able to reproduce the Reading Comprehension step.
Because Google collab can be mounted with google drive for memory. So I can buy more disk quota for this. I think I can buy for 2TB. Is that okay?
Thank you
Yes this is plenty! The images themselves are about 100GB, however, since they are tracked with git, the https://huggingface.co/datasets/PaulLerner/viquae_all_images should be about 200GB (or maybe there is some way to download only the files)
So now I am going to buy 2TB disk quota for google drive and load the https://huggingface.co/datasets/PaulLerner/viquae_all_images to the disk.
Thank you deeply for your help,
You welcome :) So, was that what was preventing you from reproducing the results or do you still need help?
I want to reproduce Information Retrieval and Reading Comprehension steps of SIGRID papers on Google colab. Can you show me how to do it?
Well, I can help but I don’t have all the required Google Colab storage quota. I did my best to provide step by step instructions https://paullerner.github.io/ViQuAE/#id1
Thank you. I'll try with your instruction again, if i get any problem, i'll tell you.
Hi again,
If i want to perform the experiment of SIGRID Paper, which has 2 parts: IR and Reading Comprehension, with parts in " https://paullerner.github.io/ViQuAE/#id" i must follow, and which parts i can skip?
Currently, I'm doing step by step following your https://paullerner.github.io/ViQuAE/#id.
I found some errors when doing and need your help.
1/ When I reach the "Find relevant passages in the IR result" step, I got the error which is in the image attached with this mail.
The error is "KeyError: 'search_indices'"
2/ At step "Global Image embedding". There are a lot picture in the data/Commons folder which I can't find the Commons folders or where to clone it, is that the Viquae_images (full version)? Can you show me how to fix it. The errors can be found at below colab link and the picture attached.
The problems with "data/Commons" images folder same with step "Face detection".
3/At step "Face recognition" i install the arcface_torch librabry, but it's seemed to be something wrong and I can't use that, I try with
git clone https://github.com/PaulLerner/insightface.git cd insightface git checkout chore/arcface_torch cd recognition pip install -e .
Error when installation: content/insightface does not appear to be a python project: neither 'setup.py' nor 'pyproject.toml' found
You can check what i'm doing at: https://colab.research.google.com/drive/1q21EPWbAIOnUoZaEs6aWrjNZ0SclE6E6?usp=sharing
This colab is permited for editing, you can edit it and check out the problem.
Thank you so much.
Bruce
Vào 22:30, T.2, 17 Th4, 2023 PaulLerner @.***> đã viết:
Well, I can help but I don’t have all the required Google Colab storage quota. I did my best to provide step by step instructions https://paullerner.github.io/ViQuAE/#id1
— Reply to this email directly, view it on GitHub https://github.com/PaulLerner/ViQuAE/issues/2#issuecomment-1511601332, or unsubscribe https://github.com/notifications/unsubscribe-auth/A6ZKKV2WJ4AUKTBJ6TL5L63XBVOZ5ANCNFSM6AAAAAAWU6QYFM . You are receiving this because you authored the thread.Message ID: @.***>
Hi Bruce,
Again, I cannot do all experiments on Colab because I have limited disk quota but I’ll try to fix your errors. Not sure what you can skip except pre-processing passages which are available here https://huggingface.co/datasets/PaulLerner/viquae_passages as explained in https://paullerner.github.io/ViQuAE/#the-viquae-knowledge-base-kb
https://paullerner.github.io/ViQuAE/#find-relevant-passages-in-the-ir-results Here 'search_indices' refers to the results of the IR, as explained in the comments. So you need to run it after BM25 to train DPR or after the fusion of DPR and image to train the reader.
I think you’re just missing cat parts/* > images.tar.gz
it is commented out in your code. https://huggingface.co/datasets/PaulLerner/viquae_all_images containes all images, of both KB and questions. Check that you can load an image from the dataset and the KB as I shown previously in https://colab.research.google.com/drive/1oTrlVWTdy4-uBUH4X0J1Iyw5-jdjc60J?usp=sharing
You should be able to load the image from $VIQUAE_IMAGES_PATH/<file title stored in the datasets>
I think you need to use % cd
instead of ! cd
, see the same notebook as above
Hey, so, I just realized that, for the Find relevant passages in the linked wikipedia article bit, you need to have the wikipedia articles in a single Dataset
instead of a DatasetDict
according to the entity type. So, you need to concatenate all splits of https://huggingface.co/datasets/PaulLerner/viquae_wikipedia and then follow instructions at Preprocessing passages. The already-processed passages are not usable as is (they are compatible with viquae_wikipedia
before splitting according to entity type). Sorry about that, I’ll update the docs and provide code tomorrow.
Hi Paul,
Thanks for your helps a lot. I'm checking it right now and going to inform you if somethings wrong happens.
Many thanks.
Bruce
Vào 14:47, T.4, 19 Th4, 2023 PaulLerner @.***> đã viết:
Here you go: https://colab.research.google.com/drive/1oTrlVWTdy4-uBUH4X0J1Iyw5-jdjc60J?usp=sharing
— Reply to this email directly, view it on GitHub https://github.com/PaulLerner/ViQuAE/issues/2#issuecomment-1514287896, or unsubscribe https://github.com/notifications/unsubscribe-auth/A6ZKKV7OEYOLG6ZLSXDKV4DXB6KBNANCNFSM6AAAAAAWU6QYFM . You are receiving this because you authored the thread.Message ID: @.***>
Hi Paul,
I learning a lot from this project and try to learn more. Thank you a lot for helping me.
If I want to perform query and get the results like the figure 6 of The SIGRID Paper: " ViQuAE, a Dataset for Knowledge-based Visual Question Answering about Named Entities", how can I do that, can you show me how to perform it on google colab? Example like I query with input questions and images and receive the results?
"Figure 6: Queries along with the top-3 results of multimodal IR. The answer (in the relevant passage) is printed in bold font and plausible answers in irrelevant passages are printed in italic. Face landmarks and bounding boxes, if detected, are shown in red. The passage of text has been shortened due to space constraints".
Glad you’re able to run code! Don’t hesitate sharing your notebook or even opening a PR :)
So, IR is done using meerqat.ir.search
as instructed here
Then, results are stored in two ways (you can use either one of them):
Dataset
under <index>_indices
, e.g. fusion_indices
for the multimodal IR (in v3, this will be removed in future versions).trec
file stored in the directory you specified with --metrics
, compatible with trec_eval
and ranx
(you should use ranx
in python see https://amenra.github.io/ranx/)So the indices represent the index of the passages or whatever KB you’re using.
So then it’s just a matter of formatting the data, I don't have code for that (the Figure 6 in SIGIR is done in LaTeX). An easy way to visualize images is through HTML by using the images URLs
Oh, actually I do have a code for that, see https://github.com/PaulLerner/ViQuAE/blob/main/meerqat/viz/html.py
It’s quite limited as it will only get the top-1 of the results (provided via ranx
) but you can you this as a starting point
Hi Paul, Thank you for your help. I wonder the code at link https://github.com/PaulLerner/ViQuAE/blob/main/meerqat/viz/html.py is running after we run https://paullerner.github.io/ViQuAE/#ir, right?
I try to perform the IR in this notebook and still not finishing it
https://colab.research.google.com/drive/164RnkIr9XdkTg2ET01f7Yq8pdu7rTnrp?usp=sharing
Look like i got some errors.
Can you help me with this problem? Thank's for reading.
Hi Paul, Thank you for your help. I wonder the code at link https://github.com/PaulLerner/ViQuAE/blob/main/meerqat/viz/html.py is running after we run https://paullerner.github.io/ViQuAE/#ir, right?
Yes, it’s a mean to visualize the output or IR.
https://colab.research.google.com/drive/164RnkIr9XdkTg2ET01f7Yq8pdu7rTnrp?usp=sharing
BM25 relies on an ElasticSearch server. See instructions here https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html
Glad you’re able to run code! Don’t hesitate sharing your notebook or even opening a PR :)
So, IR is done using
meerqat.ir.search
as instructed hereThen, results are stored in two ways (you can use either one of them):
- in the
Dataset
under<index>_indices
, e.g.fusion_indices
for the multimodal IR (in v3, this will be removed in future versions)- in a
.trec
file stored in the directory you specified with--metrics
, compatible withtrec_eval
andranx
(you should useranx
in python see https://amenra.github.io/ranx/)So the indices represent the index of the passages or whatever KB you’re using.
So then it’s just a matter of formatting the data, I don't have code for that (the Figure 6 in SIGIR is done in LaTeX). An easy way to visualize images is through HTML by using the images URLs
Oh hi, I find out .trec file after I executing the BM25 command code, the result in this file be like below:
"0033878fb91e955d22822f40cec7c2e1 Q0 11621715 1 -1.2944598574708164 BM25 0033878fb91e955d22822f40cec7c2e1 Q0 1879592 2 -1.295427373876715 BM25 0033878fb91e955d22822f40cec7c2e1 Q0 8869279 3 -1.2960834388883475 BM25 0033878fb91e955d22822f40cec7c2e1 Q0 7167228 4 -1.3074305601851612 BM25 0033878fb91e955d22822f40cec7c2e1 Q0 8622366 5 -1.3104105449031884 BM25 0033878fb91e955d22822f40cec7c2e1 Q0 3208145 6 -1.3107245603868698 BM25 0033878fb91e955d22822f40cec7c2e1 Q0 11463063 7 -1.3118529306687317 BM25 0033878fb91e955d22822f40cec7c2e1 Q0 7078077 8 -1.317489824838505 BM25 0033878fb91e955d22822f40cec7c2e1 Q0 5035941 9 -1.3177450372049373 BM25 0033878fb91e955d22822f40cec7c2e1 Q0 5760705 10 -1.3337793139522363 BM25 ..."
Can you explain a little bit about the meaning of this result?
In the visualize code, i got the error that is:
"usage:" (case-insensitive) not found.
Oh hi, I find out .trec file after I executing the BM25 command code, the result in this file be like below:
"0033878fb91e955d22822f40cec7c2e1 Q0 11621715 1 -1.2944598574708164 BM25 0033878fb91e955d22822f40cec7c2e1 Q0 1879592 2 -1.295427373876715 BM25 0033878fb91e955d22822f40cec7c2e1 Q0 8869279 3 -1.2960834388883475 BM25 0033878fb91e955d22822f40cec7c2e1 Q0 7167228 4 -1.3074305601851612 BM25 0033878fb91e955d22822f40cec7c2e1 Q0 8622366 5 -1.3104105449031884 BM25 0033878fb91e955d22822f40cec7c2e1 Q0 3208145 6 -1.3107245603868698 BM25 0033878fb91e955d22822f40cec7c2e1 Q0 11463063 7 -1.3118529306687317 BM25 0033878fb91e955d22822f40cec7c2e1 Q0 7078077 8 -1.317489824838505 BM25 0033878fb91e955d22822f40cec7c2e1 Q0 5035941 9 -1.3177450372049373 BM25 0033878fb91e955d22822f40cec7c2e1 Q0 5760705 10 -1.3337793139522363 BM25 ..."
Can you explain a little bit about the meaning of this result?
You can load it using ranx
. See https://amenra.github.io/ranx/ or
https://github.com/PaulLerner/ViQuAE/blob/ff26b88208871165b3122aaededfcb2ff74d6482/meerqat/viz/html.py#L35
In the visualize code, i got the error that is:
"usage:" (case-insensitive) not found.
This is for parsing the docstring with docopt
you do not need to use that from google colab, unless you run a shell command like !python -m meerqat.viz.html <args>
Now you can call format_html
directly with the arguments you need
Thank you so much.
I'll try it and inform you soon.
Hey @BruceKenas did you manage to reproduce the results? If yes, could you please share your code and close this issue?
Bests,
Paul
Dear Paul I'm still working on it but I'm kinda busy with many works this time. I want to finish it soon but it seem that I must take more time on it. If you want to close the experiment please close it.
Thank you so much.
Vào 23:36, T.3, 30 Th5, 2023 PaulLerner @.***> đã viết:
Hey @BruceKenas https://github.com/BruceKenas did you manage to reproduce the results? If yes, could you please share your code and close this issue?
Bests,
Paul
— Reply to this email directly, view it on GitHub https://github.com/PaulLerner/ViQuAE/issues/2#issuecomment-1568742721, or unsubscribe https://github.com/notifications/unsubscribe-auth/A6ZKKVYHVFXT6BHTNCLIIDDXIYOZLANCNFSM6AAAAAAWU6QYFM . You are receiving this because you were mentioned.Message ID: @.***>
Hey @BruceKenas I guess I can close this Feel free to share your code if it may help :)
Yeah sure you can close this topic. Sorry for a lot of busy in my life up to now I can't finish the whole colab experiment on ViQuAE. Thanks a lot for your support that time!
Best regards,
Bruce Kenas
Vào 23:26 T.3, 16 Th7, 2024 Paul Lerner @.***> đã viết:
Hey @BruceKenas https://github.com/BruceKenas I guess I can close this Feel free to share your code if it may help :)
— Reply to this email directly, view it on GitHub https://github.com/PaulLerner/ViQuAE/issues/2#issuecomment-2231349087, or unsubscribe https://github.com/notifications/unsubscribe-auth/A6ZKKV2WQAUSSD5XYCJWMZ3ZMVCUZAVCNFSM6AAAAABK66DMLOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZRGM2DSMBYG4 . You are receiving this because you were mentioned.Message ID: @.***>
I tried to reproduce the experiment with google colab but when i run "meerqat/data/loading" it has problem with thi code line:
from meerqat import file as ROOT_PATH
The issue is I can't find file in meerqat folder. Can you help me with this? Many thanks.