Open angelosalatino opened 4 years ago
Just to add further info.
I downloaded the docker.
I run it using
docker run -it -p 8888:8888 uscisii2/kgtk:latest /bin/bash -c "jupyter notebook --ip='*' --port=8888 --no-browser"
Then I went into kgtk/examples/ and I started running Example1 - Embeddings
Actually even before that error I see a warning at this cell:
%%bash
kgtk import_conceptnet --english_only conceptnet-assertions-5.7.0.csv / \
filter -p " ; /r/Causes,/r/UsedFor,/r/Synonym,/r/DefinedAs,/r/IsA ; " / sort -c 1,2,3 \
| head -30000 |
kgtk text_embedding --debug --embedding-projector-metadata-path none \
--embedding-projector-metadata-path none \
--label-properties "/r/Synonym" \
--isa-properties "/r/IsA" \
--description-properties "/r/DefinedAs" \
--property-value "/r/Causes" "/r/UsedFor" \
--has-properties "" \
-f kgtk_format \
--output-format kgtk_format \
--use-cache \
--model bert-large-nli-cls-token \
> emb.txt
and I get
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/kgtk-0.4.0-py3.7.egg/kgtk/exceptions.py", line 42, in __call__
return_code = func(*args, **kwargs) or 0
File "/opt/conda/lib/python3.7/site-packages/kgtk-0.4.0-py3.7.egg/kgtk/cli/text_embedding.py", line 332, in run
main(**kwargs)
File "/opt/conda/lib/python3.7/site-packages/kgtk-0.4.0-py3.7.egg/kgtk/cli/text_embedding.py", line 208, in main
property_labels_dict=property_labels_dict)
File "/opt/conda/lib/python3.7/site-packages/kgtk-0.4.0-py3.7.egg/kgtk/gt/embedding_utils.py", line 405, in read_input
raise KGTKException("Missing column: {}".format(missing_column))
kgtk.exceptions.KGTKException: Missing column: {'label'}
Missing column: {'label'}
Hi @angelosalatino, It took a while, but I have been able to reproduce this problem. We'll look into it and get back to you.
@angelosalatino, I have an explanation and a solution:
Explanation of the problem: The reason why you are getting this error is that the command is expecting a file with a label
column, but the file has a column named relation
. These are equivalent, and that command supported both in a previous version. However, KGTK has undergone active development, and some of the commands have slightly changed and made more consistent. It looks like now the embeddings command only accepts file with label. We are adding more and more tests to detect these things, but we may have missed some like this one.
Solution: The easy solution is to separate the command in two:
kgtk import_conceptnet --english_only conceptnet-assertions-5.7.0.csv / \
filter -p " ; /r/Causes,/r/UsedFor,/r/Synonym,/r/DefinedAs,/r/IsA ; " / sort -c 1,2,3 \
| head -30000 > heads.kgtk
Then replace relation
with column
in the first row:
sed -i '1!b;s/relation/label/' heads.kgtk
And then calculate the embedding:
kgtk text-embedding -i heads.kgtk --debug --embedding-projector-metadata-path none --embedding-projector-metadata-path none --label-properties "/r/Synonym" --isa-properties "/r/IsA" --description-properties "/r/DefinedAs" --property-value "/r/Causes" "/r/UsedFor" --has-properties "" -f kgtk_format --output-format kgtk_format --use-cache --model bert-large-nli-cls-token > emb.txt
If you want to save time for rerunning the first command plus sed, I have done it in the attached file (I had to rename it from heads.kgtk
to heads.txt
because GitHub didn't like it.
Note that running the embedding may take a long time.
In the meantime, I have opened issue https://github.com/usc-isi-i2/kgtk/issues/164. We will be fixing it in the next days.
Using a GPU Colab notebook for running bert-large may do the trick
On Fri, 16 Oct 2020, 06:28 Daniel Garijo, notifications@github.com wrote:
@angelosalatino https://github.com/angelosalatino, I have an explanation and a solution:
Explanation of the problem: The reason why you are getting this error is that the command is expecting a file with a label column, but the file has a column named relation. These are equivalent, and that command supported both in a previous version. However, KGTK has undergone active development, and some of the commands have slightly changed and made more consistent. It looks like now the embeddings command only accepts file with label. We are adding more and more tests to detect these things, but we may have missed some like this one.
Solution: The easy solution is to separate the command in two:
kgtk import_conceptnet --english_only conceptnet-assertions-5.7.0.csv / \ filter -p " ; /r/Causes,/r/UsedFor,/r/Synonym,/r/DefinedAs,/r/IsA ; " / sort -c 1,2,3 \ | head -30000 > heads.kgtk
Then replace relation with column in the first row:
sed -i '1!b;s/relation/label/' heads.kgtk
And then calculate the embedding:
kgtk text-embedding -i heads.kgtk --debug --embedding-projector-metadata-path none --embedding-projector-metadata-path none --label-properties "/r/Synonym" --isa-properties "/r/IsA" --description-properties "/r/DefinedAs" --property-value "/r/Causes" "/r/UsedFor" --has-properties "" -f kgtk_format --output-format kgtk_format --use-cache --model bert-large-nli-cls-token > emb.txt
If you want to save time for rerunning the first command plus sed, I have done it in the attached file (I had to rename it from heads.kgtk to heads.txt because GitHub didn't like it.
Note that running the embedding may take a long time.
In the meantime, I have opened issue usc-isi-i2/kgtk#164 https://github.com/usc-isi-i2/kgtk/issues/164. We will be fixing it in the next days.
heads.txt https://github.com/ISWC-Reproducibility-Track/Paper_608/files/5389172/heads.txt
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ISWC-Reproducibility-Track/Paper_608/issues/2#issuecomment-709746825, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABXQHJ5T6AP3RMBVZB4FGDSK7DVXANCNFSM4SR4MTAQ .
@angelosalatino, I have an explanation and a solution:
Explanation of the problem: The reason why you are getting this error is that the command is expecting a file with a
label
column, but the file has a column namedrelation
. These are equivalent, and that command supported both in a previous version. However, KGTK has undergone active development, and some of the commands have slightly changed and made more consistent. It looks like now the embeddings command only accepts file with label. We are adding more and more tests to detect these things, but we may have missed some like this one.Solution: The easy solution is to separate the command in two:
kgtk import_conceptnet --english_only conceptnet-assertions-5.7.0.csv / \ filter -p " ; /r/Causes,/r/UsedFor,/r/Synonym,/r/DefinedAs,/r/IsA ; " / sort -c 1,2,3 \ | head -30000 > heads.kgtk
Then replace
relation
withcolumn
in the first row:sed -i '1!b;s/relation/label/' heads.kgtk
And then calculate the embedding:
kgtk text-embedding -i heads.kgtk --debug --embedding-projector-metadata-path none --embedding-projector-metadata-path none --label-properties "/r/Synonym" --isa-properties "/r/IsA" --description-properties "/r/DefinedAs" --property-value "/r/Causes" "/r/UsedFor" --has-properties "" -f kgtk_format --output-format kgtk_format --use-cache --model bert-large-nli-cls-token > emb.txt
If you want to save time for rerunning the first command plus sed, I have done it in the attached file (I had to rename it from
heads.kgtk
toheads.txt
because GitHub didn't like it.Note that running the embedding may take a long time.
In the meantime, I have opened issue usc-isi-i2/kgtk#164. We will be fixing it in the next days.
This solution did the trick. Thank you @dgarijo
Hi Guys, I followed @dgarijo advice. I am testing it using Docker.
However in this piece of code from Example1 - Embeddings
I get:
Do you know why it is generating such error?