ISWC-Reproducibility-Track / Paper_608

0 stars 0 forks source link

Example 6 #6

Open angelosalatino opened 3 years ago

angelosalatino commented 3 years ago

Hi I downloaded wt.10000.nt from https://drive.google.com/file/d/1gXYFqyqPtjvfYvFjHl53sKMF0Y489JHz/view

When I run

!kgtk filter -p ';tn:cellValue;' -i $WT.tsv | grep '\twiki:' | sed 's/wiki:/http:\/\/en.wikipedia.org\/wiki\//' > temp.tsv
!echo -e "node1\tlabel\tnode2" > $WT.wikipedia.tsv
!cat temp.tsv >> $WT.wikipedia.tsv
!rm temp.tsv

the file $WT.wikipedia.tsv is empty. I mean there is only the header. This is because the file temp.tsv before being deleted is actually empty. @dgarijo

dgarijo commented 3 years ago

Correct me if I am wrong, but a few lines below there is a %env WT=wt.100000000

So the file should be this one: https://drive.google.com/file/d/1mCWFADxNnJwi4oIk8rhUBe3MA6o0AVvx/view?usp=sharing

correct?

angelosalatino commented 3 years ago

Hi Daniel, yes you are right. I downloaded that file.

However, it still produces an empty temp.tsv

Don't understand why

dgarijo commented 3 years ago

Me neither. I will reproduce and come back to you. Filter command has been updated, so maybe it's due to that.

dgarijo commented 3 years ago

I have reproduced the problem. I will dig in. Maybe there is an issue with the file

dgarijo commented 3 years ago

Ok, there is a small variation that needs to be applied to that command (Notebook was build on macOS, and apparently it does not translate to Unix). You have to add a -P in the grep command (the kgtk filter command is fine)

kgtk filter -p ';tn:cellValue;' -i wt1000000.tsv | grep -P '\twiki:' | sed 's/wiki:/http:\/\/en.wikipedia.org\/wiki\//' > temp.tsv

I tested this and you should get something like this:

root@f073e4812183:/out/Notebook6# head temp.tsv
X:c5020573_0    tn:cellValue    http://en.wikipedia.org/wiki/Allmusic
X:c5020574_6    tn:cellValue    http://en.wikipedia.org/wiki/Rockwilder
X:c5020574_10   tn:cellValue    http://en.wikipedia.org/wiki/Pete_Rock
X:c5020574_18   tn:cellValue    http://en.wikipedia.org/wiki/Fredro_Starr
X:c5020574_22   tn:cellValue    http://en.wikipedia.org/wiki/DJ_Clark_Kent
X:c5020574_25   tn:cellValue    http://en.wikipedia.org/wiki/The_Actual_%28song%29
X:c5020574_26   tn:cellValue    http://en.wikipedia.org/wiki/DJ_Premier
X:c5020574_30   tn:cellValue    http://en.wikipedia.org/wiki/Ron_%27%27Amen-Ra%27%27_Lawrence
X:c5020574_30   tn:cellValue    http://en.wikipedia.org/wiki/The_Hitmen_%28production_team%29
X:c5020578_1    tn:cellValue    http://en.wikipedia.org/wiki/American_Pie_Presents%3A_Band_Camp
angelosalatino commented 3 years ago

OK. This is going now.