Open nitinpi0210 opened 2 years ago
Thanks for giving me access to All_MDB.csv. This file contains the fields : word_id, words, sent_id, label. To run the OIE notebook, I still need the following fields in the dataset : word_id word pred pred_id head_pred_id sent_id run_id label
For eg. the code you have from the Stanovsky paper for getting sentences from df, needs the runid : def get_sents_from_df( df):
return [df[df.run_id == run_id]
for run_id
in sorted(set(df.run_id.values))]
And then later on when you call load_dataset_encodeinputs, it needs the following fields : df.word_id = pd.to_numeric(df.word_id, errors='coerce').astype('Int64') df.run_id = pd.to_numeric(df.run_id, errors='coerce').astype('Int64') df.sent_id = pd.to_numeric(df.sent_id, errors='coerce').astype('Int64') df.head_pred_id = pd.to_numeric(df.head_pred_id, errors='coerce').astype('Int64')
Is it possible for you to upload the MalwareDB dataset that you used for the OIE notebook that contains all the fields needed to successfully run the notebook?
I just tried executing the OIE notebook using the new ALL_MDB.csv file you uploaded and as expected I get the following error as it doesn't contain the run_id field. Can you please uploaded the malware db dataset that contains the run_id field. Also can you please clarify what the run_id field is?
I copied the entire column sent_id as run_id and got past the run_id issue but still the dataset doesn't contain all the fields required for the OIE notebook to run correctly. It needs the pred column and complaining on that :
Hi @nitinpi0210,
I am able to run to ~block 7, my code is here: https://colab.research.google.com/drive/1Kh9gsdG2rcySVo-GV5Xc9mW7rx6-pVuW?usp=sharing
But im facing error with Tensorflow, I put it here for you guys, Would really appreciate if you are able to solve the tensorflow issue!
Hi @malcolm1232 can you share your notebook with nitin.pillai@berkeley.edu or nitinpillai@gmail.com? I can't access it to help you debug :
Also @malcolm1232 how were you able to run until block 7 with the malware db dataset? It doesn't contain all those fields that are needed? Can you please upload the dataset that you ran OIE until Block 7 and give me access?
Sarhan said she will reply later this week as she is busy with her deadlines this week. So as soon as I get her dataset, I will try again too. But in the meantime if you modified the dataset to get it to run to Block 7, can you share that dataset? (nitin.pillai@berkeley.edu)
Hi, @nitinpi0210 , i have given access. The dataset used was from author under _MSB_all_csv.csv I was able to run via data manipulation from dataset provided by author (Assumingly i did it correctly) Have a good day! do let me know if you run into any troubles
Hi @malcolm1232, I am also facing the same issue below. Can you please give me the access to my email: harsh.jaiswal4@gmail.com.
Regards, Harsh Vardhan Jaiswal
Hi @malcolm1232 can you share your notebook with nitin.pillai@berkeley.edu or nitinpillai@gmail.com? I can't access it to help you debug :
@hvjrocks-ds , i have done so already @IS5882 , was wondering if you recalled which tensorflow version you were using! Do feel free to let me know the tf version when u are free!
@malcolm1232 thanks for sharing. Btw, for the OIE notebook, we were supposed to use the malwaredb dataset as per the author. Why did you use the MSB dataset? That was supposed to be used for the NER notebook as per the paper.
Also is this the right move to do? Can you clarify why you are setting the head pred id to 0 throughout the dataframe?
I updated the public shared folder with OIE dataset that includes all fields
@hvjrocks-ds , i have done so already @IS5882 , was wondering if you recalled which tensorflow version you were using! Do feel free to let me know the tf version when u are free!
For the NER ?
I am using the following TF and Keras version :
But running into the following issue in Block 7
This is for the OIE Notebook. @IS5882 what version of TF and Keras is needed for the OIE?
yes ive got the same problem as well, need to try to obtain tensorflow/keras version.
Update: Drive Folder here: https://drive.google.com/drive/folders/1zbf2bLLknxEHLJkcVKKmGHnwB9LseCID
Also, @nitinpi0210 do note that the spacy_wrapper were custom spacy wrapper i created .
Actual Code ; library which is not available anymore:
from spacy_wrapper import spacy_whitespace_parser as spacy_ws
Custom Code I wrote the custom spacy code from what i could undestand of the objective of the initial spacy_ws which is to "split on whitespace characters"
def spacyws(input):
returns_ = input_.split()
return returns_
Also, @IS5882 so sorry for the trouble, but the spacy_wrapper.py file is empty U.U sorry for the inconvenience!
Also is this the right move to do? Can you clarify why you are setting the head pred id to 0 throughout the dataframe?
i did this because of the code:
assert(len(set(full_sent.head_pred_id.values)) == 1) # Sanity check If the len values ==1 as sanity check, i assumed it can be any integer.
@malcolm1232 the author gave the new correct malware dataset that has the relevant fields. So you don't need to do all that DF modifications anymore. I just used the new dataset and can get to Block7 with no issues. Now dealing with tensorflow issues.
@malcolm1232 the author gave the new correct malware dataset that has the relevant fields. So you don't need to do all that DF modifications anymore. I just used the new dataset and can get to Block7 with no issues. Now dealing with tensorflow issues.
oh thanks a ton @IS5882 @nitinpi0210 ❤️ ❤️!!!
@hvjrocks-ds , i have done so already @IS5882 , was wondering if you recalled which tensorflow version you were using! Do feel free to let me know the tf version when u are free!
For the NER ?
@IS5882 I am trying to Run the OIE Notebook, but have encountered the same tensorflow/keras error as @nitinpi0210 , so just wondering what tensorflow/keras version you were using! oh yes also!! spacy_wrapper.py file is empty U.U
@malcolm1232 @IS5882 The OIE notebook finally works. Didn't need to modify Google colab TF or Keras version and they are both running with their default 2.8.0 versions. What did the trick is the following 2 lines in Block 7 where in the original code it was tensorflow.python.keras..remove the python from there :
from tensorflow.keras.layers import Layer from tensorflow.keras import backend as K
The OIE notebook runs fine now in its completion. Thanks a lot @IS5882 for giving us the modified dataset. Ran to completion finally !
The OIE notebook runs fine now in its completion. Thanks a lot @IS5882 for giving us the modified dataset.
Ran to completion finally !
OMMMGGGG!!! Okkays I'll give it a try and let u know!!
hi, @nitinpi0210 i was able to run the notebook as well, but is it possible to share yours so i could take a look at it as well? sorry for the inconvenience!
oh yes, i am wondering if you will be working in the Knowledge graph as well? @nitinpi0210 @IS5882 , i was wondering if you'd have the data for Knowledge_Graph_Canonicalization.ipynb as well!
hey @malcolm1232 whats your email so that I can share? Btw with just those 2 lines you should be able to get things running. Am only doing the OIE piece for now and will do NER and KG later.
hey @malcolm1232 whats your email so that I can share? Btw with just those 2 lines you should be able to get things running. Am only doing the OIE piece for now and will do NER and KG later.
Hi ! @nitinpi0210 my email is malcolmTHL95@gmail.com . Yes i got them running already! But would like to see ur train test split etc. Im still working on the KG, which is even much much tougher to get working without corresponding datatsets xDD
@IS5882 @nitinpi0210 @hvjrocks-ds @hvjrocks-ds Help me,plz !
@qlM0ri4rty the google verification code is based on your google credentials. When you run the cell, you should get a popup asking you to enter your google username and password. Make sure you enable popups in your browser so that it doesn't get blocked.
@nitinpi0210 Thanks,I just ran the notebook successfully,but i don't know why the output looks like this.I mean,this shouldn't be the NER's result?
Hey!I have a new email address to contact you. I wonder why the OIE notebook has no output files.I mean,shouldn't it have a output files like .csv?I just saw the visualization of the prediction,I think there should be an output model,and a .csv file.
By the way,did you run the KG notebook?I'm working on it now.I hope I can get some help from you. Thanks!
寒蝉海猫 @.***
------------------ 原始邮件 ------------------ 发件人: "IS5882/Open-CyKG" @.>; 发送时间: 2022年5月20日(星期五) 上午6:24 @.>; @.**@.>; 主题: Re: [IS5882/Open-CyKG] Datasets needed for OIE and NER (Issue #9)
@qlM0ri4rty the google verification code is based on your google credentials. When you run the cell, you should get a popup asking you to enter your google username and password. Make sure you enable popups in your browser so that it doesn't get blocked.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
The OIE notebook runs fine now in its completion. Thanks a lot @IS5882 for giving us the modified dataset. Ran to completion finally !
Hi, could you please share the data files with me metioned above with me? I can't find it. My email is zzzxp111@gmail.com. Thank you so much!!!
hi, @nitinpi0210 i was able to run the notebook as well, but is it possible to share yours so i could take a look at it as well? sorry for the inconvenience!
Hi, could you please share the data files with me mentioned above with me? I can't find it. My email address is zzzxp111@gmail.com. Thank you so much!!!
@VICKY-ZZ shared my notebook with you : https://colab.research.google.com/drive/1faR2ByWpdbYQoVhtW971fads4HRTr96u
Let me know if you view.
@VICKY-ZZ shared my notebook with you : https://colab.research.google.com/drive/1faR2ByWpdbYQoVhtW971fads4HRTr96u
Let me know if you view.
Thank you sooooooo much!!!
@nitinpi0210 I encountered a problem like this when I tried to run your notebook.I am using the dataset shared by the author on google-driver(all_MLB.ioe.zip). Is my dataset correct? I hope you can share your dataset. this is my email: jiaxsongsci@gmail.com Thank you so much!!!
The OIE notebook runs fine now in its completion. Thanks a lot @IS5882 for giving us the modified dataset. Ran to completion finally !
OMMMGGGG!!! Okkays I'll give it a try and let u know!!
It's great that you can run the code successfully. I still have some problems, could you please share the code and data files with me ? My email is [jpdong00@gmail.com] Thank you so much !
The OIE notebook runs fine now in its completion. Thanks a lot @IS5882 for giving us the modified dataset. Ran to completion finally !
OMMMGGGG!!! Okkays I'll give it a try and let u know!!
It's great that you can run the code successfully. I still have some problems, could you please share the code and data files with me ? My email is [jpdong00@gmail.com] Thank you so much !
Thank you soooooooo much!!!!
------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2022年6月9日(星期四) 晚上8:54 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [IS5882/Open-CyKG] Datasets needed for OIE and NER (Issue #9)
@VICKY-ZZ shared my notebook with you : https://colab.research.google.com/drive/1faR2ByWpdbYQoVhtW971fads4HRTr96u
Let me know if you view.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
Hi @nitinpi0210, could you please share the data files for OIE task with me mentioned above? I can't find it. My email address is ktrung2210@gmail.com. Thank you soo much !
Hi @IS5882 , I am Interested in your work as well for my master thesis on MISP kgs. Can you share the MLB_all_csv and NER data with me as well please. My mail is l.lukas@hm.edu .
Hello, could your share the dataset with me? It's so helpful for my project. Thanks so much! My email is jenfunf@gmail.com.
@nitinpi0210 Hi, can you please share the dataset with me? Thanks soooooooo much. My email address is zxt841104@126.com
Hello, could your share the dataset with me? It's so helpful for my project. Thanks so much! My email is yxinmiracle@gmail.com
Hi @nitinpi0210, could you please share the modified dataset for OIE task with me mentioned above? I can't find it. My email address is pohu12138@gmail.com . Thank you soo much !
hi, @nitinpi0210 i was able to run the notebook as well, but is it possible to share yours so i could take a look at it as well? sorry for the inconvenience!
hello @malcolm1232, could you share your notebook, thank you so much. My email is bvx.thong0202@gmail.com
Hi @nitinpi0210, could you please share the modified dataset for OIE task with me mentioned above? I can't find it. My email address is zhangyiqiong999@163.com Thank you so much !
Hi @nitinpi0210, could you please share the data files for OIE task with me mentioned above? thanks for sharing My email address is daiweinudt@163.com
@IS5882 hi,I am studying cybersecurity knowledge graphs and want to do further research and need to reproduce your project. Could you share me the modified dataset for OIE task of this paper? My email address is yajruan@163.com, please contact me. Thank you so much!!!
Hi @nitinpi0210, could you please share the dataset for OIE task with me mentioned above. My email address is yajruan@163.com, please contact me. Thank you soo much!!!!
Hi Sarhan, for our NLP course project at Berkeley, we are following your paper on opencykg. Just as another user Malcom explained in one of the posts, we also need the datasets you used for the OIE python notebook. I downloaded the malwaretextdb database directly from your paper's reference but that doesn't contain any of the fields required by the downstream code such as : word_id word pred pred_id head_pred_id sent_id run_id label
Can you please give me access to the datasets that are needed to succesfully run the OIE notebook? My email is : nitin.pillai@berkeley.edu.
We are in a time crunch here with course deadlines approaching. So would be grateful if you could give us access to the datasets that you used for the OIE and NER notebooks.
Thanks, Nitin