Jindi0 / SAFE

58 stars 15 forks source link

How to get ‘ ../pf_embedding/case_headline.npy’,'../pf_embedding/case_body.npy','../pf_embedding/case_image.npy' and '../pf_embedding/case_y_fn.npy' #7

Open Jewellll opened 2 years ago

Jewellll commented 2 years ago

I did not find any information about '../pf_embedding/case_headline.npy', '../pf_embedding/case_body.npy', '../pf_embedding/case_image.npy', '../pf_embedding/case_y_fn.npy' on github document.I only have the embedding results (headline.npy and body.npy) of each article.So how do I merge together or is there another way.

Jindi0 commented 2 years ago

The code with these directories is just used as an interface to load your data. These embedding files should be created according to your dataset. We just provided embedding of headline and body, and do not provide the embedding of the image. To run the program, you need to collect the images first and use the Show-and-Tell tool to extract their content, then use the embedding tool to embed them and generate the image.npy.

Jewellll commented 2 years ago

Thank you very much for your enthusiastic answers. If I want to use the source text dataset, I get the headline.npy and body.npy of each news according to the source code. Should I merge the embedding results of all news into one file, for example, merge the headline.npy of each news into case_headline.npy. Or there are other specific methods. Please forgive my limited knowledge, thank you.

Jindi0 commented 2 years ago

For convenience, we merge all the embedding results of the headline into a case_headline.npy file, the same for body and image. As you can see in the code, each component of the news is loaded once. But you are able to separately load selected news by modifying the data loading code if your dataset is huge or you want to use different subsets of the dataset in your experiments.

Teddy12155555 commented 2 years ago

Hi, I still can't build up this repo. As you mentioned that "merge all the embedding results of the headline into a case_headline.npy in the code", I can't find where it is, could you show me where is the code or maybe show me the merge-format of case_headline.npy ? Please forgive my limited knowledge, thanks

Tangnameless commented 2 years ago

don't know how to merge so many headline. npy and body.npy

fovik1126 commented 2 years ago

how to get this y_fn, it should be 0 or1 in y_fn, does this also need to be embedded by the SIF tool?

Jindi0 commented 2 years ago

Sorry for this late reply. After finishing the text embedding, the text of an article will be represented as a NumPy array. We use np.concatenate to merge the embedding results of multiple articles into a huge NumPy array and store it as a npy file. You will find the embedding results in the README file of the embedding branch.

thinking024 commented 1 year ago

Sorry for this late reply. After finishing the text embedding, the text of an article will be represented as a NumPy array. We use np.concatenate to merge the embedding results of multiple articles into a huge NumPy array and store it as a npy file. You will find the embedding results in the README file of the embedding branch.

I'm sorry, but article text embedding results in the FakeNewsNet_Dataset_processed in Google Drive you mentioned in embedding branch are still separated. And I try to use np.concatenate to merge the embedding results, but concatenating results is a 2-D ndarray, but the x_head and x_body seem to be 3-D ndarray in your code.

Jindi0 commented 1 year ago

Good morning Yiyuan,

Thanks for your interest in our work! I will answer your questions in this email.

For the question you send to Dr.Zhou: You used the case_image.npy in your code for train.py. But where is the case_image.npy from? Did you get a sentence from NewsImage in the FakeNewsNet dataset using the ShowAndTell model, and take that sentence as input to SIF, then generate case_image.npy?

Because we do not provide the image sentence file, users need to execute ShowAndTell to obtain the image sentence. The sentence extracted from an image should be embedded into an array of size nxm, where n is the number of words in the sentence and m is the size of each word's embedding vector.

For the question in Github: I'm sorry, but article text embedding results in the FakeNewsNet_Dataset_processed in Google Drive you mentioned in embedding branch are still separated. And I try to use np.concatenate to merge the embedding results, but concatenating results is a 2-D ndarray, but the x_head and x_body seem to be 3-D ndarray in your code.

Please form the NumPy array in the shape of p x n x m, where p is the number of news you want to use, n is the number of words in the article, and m is the size of each word's embedding vector. np.reshape could convert your 2D array to 3D with the specified shape.

I will modify the Readme file in Github later to provide more explanation. Thanks!

Sincerely, Jindi


From: Yiyuan Zhu @.> Sent: Monday, November 7, 2022 8:35 AM To: Jindi0/SAFE @.> Cc: Jindi0 @.>; Comment @.> Subject: Re: [Jindi0/SAFE] How to get ‘ ../pf_embedding/case_headline.npy’,'../pf_embedding/case_body.npy','../pf_embedding/case_image.npy' and '../pf_embedding/case_y_fn.npy' (Issue #7)

Sorry for this late reply. After finishing the text embedding, the text of an article will be represented as a NumPy array. We use np.concatenate to merge the embedding results of multiple articles into a huge NumPy array and store it as a npy file. You will find the embedding results in the README file of the embedding branch.

I'm sorry, but article text embedding results in the FakeNewsNet_Dataset_processed in Google Drive you mentioned in embedding branch are still separated. And I try to use np.concatenate to merge the embedding results, but concatenating results is a 2-D ndarray, but the x_head and x_body seem to be 3-D ndarray in your code.

— Reply to this email directly, view it on GitHubhttps://github.com/Jindi0/SAFE/issues/7#issuecomment-1305627186, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AP56TZIN2CGTPWDXL5RTC23WHEATDANCNFSM5GXF3F2A. You are receiving this because you commented.Message ID: @.***>

thinking024 commented 1 year ago

Forgive me, you recommend me form NumPy array in the shape of p x n x m, where n is the number of words in the article, but the numbers of words in each news are quite different.

For example, the embedding result of news A is a 100x300 ndarray, which means there are 100 words in A, and another result of news B with 60 words is a 60x300 ndarray. How can I combine them into a 3-D Numpy array? I can't apply np.reshape or np.stack. Shall I pad the result of B with zero to make it have the same shape of 100x300 so that I can combine A and B into a 3-D ndarray with the shape of 2x100x300?

Besides, although I can use padding, but the shapes of image.npy which is generated from SIF by using ShowAndTell to convert image to sentence are different too. Some of them even have the shape of 50x300, while 50 is beyond the limit of image_size which is equal to 20 in your code. I can't pad these image array.

------------------ 原始邮件 ------------------ 发件人: "Jindi0/SAFE" @.>; 发送时间: 2022年11月7日(星期一) 晚上10:05 @.>; @.**@.>; 主题: Re: [Jindi0/SAFE] How to get ‘ ../pf_embedding/case_headline.npy’,'../pf_embedding/case_body.npy','../pf_embedding/case_image.npy' and '../pf_embedding/case_y_fn.npy' (Issue #7)

Good morning Yiyuan,

Thanks for your interest in our work! I will answer your questions in this email.

For the question you send to Dr.Zhou: You used the case_image.npy in your code for train.py. But where is the case_image.npy from? Did you get a sentence from NewsImage in the FakeNewsNet dataset using the ShowAndTell model, and take that sentence as input to SIF, then generate case_image.npy?

Because we do not provide the image sentence file, users need to execute ShowAndTell to obtain the image sentence. The sentence extracted from an image should be embedded into an array of size nxm, where n is the number of words in the sentence and m is the size of each word's embedding vector.

For the question in Github: I'm sorry, but article text embedding results in the FakeNewsNet_Dataset_processed in Google Drive you mentioned in embedding branch are still separated. And I try to use np.concatenate to merge the embedding results, but concatenating results is a 2-D ndarray, but the x_head and x_body seem to be 3-D ndarray in your code.

Please form the NumPy array in the shape of p x n x m, where p is the number of news you want to use, n is the number of words in the article, and m is the size of each word's embedding vector. np.reshape could convert your 2D array to 3D with the specified shape.

I will modify the Readme file in Github later to provide more explanation. Thanks!

Sincerely, Jindi


From: Yiyuan Zhu @.> Sent: Monday, November 7, 2022 8:35 AM To: Jindi0/SAFE @.> Cc: Jindi0 @.>; Comment @.> Subject: Re: [Jindi0/SAFE] How to get ‘ ../pf_embedding/case_headline.npy’,'../pf_embedding/case_body.npy','../pf_embedding/case_image.npy' and '../pf_embedding/case_y_fn.npy' (Issue #7)

Sorry for this late reply. After finishing the text embedding, the text of an article will be represented as a NumPy array. We use np.concatenate to merge the embedding results of multiple articles into a huge NumPy array and store it as a npy file. You will find the embedding results in the README file of the embedding branch.

I'm sorry, but article text embedding results in the FakeNewsNet_Dataset_processed in Google Drive you mentioned in embedding branch are still separated. And I try to use np.concatenate to merge the embedding results, but concatenating results is a 2-D ndarray, but the x_head and x_body seem to be 3-D ndarray in your code.

— Reply to this email directly, view it on GitHub<https://github.com/Jindi0/SAFE/issues/7#issuecomment-1305627186&gt;, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AP56TZIN2CGTPWDXL5RTC23WHEATDANCNFSM5GXF3F2A&gt;. You are receiving this because you commented.Message ID: @.***>

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

Jindi0 commented 1 year ago

If you check the README in the embedding branch of the GitHub repository, you will find the number of the kept word is customizable. All the words are kept in our provided files. Please uniform the number of words for news according to your needs, e.g, keep the first 20 words for the headline, keep the first 50 words for the body text, and keep the first 10 words for the image text, etc.


From: Yiyuan Zhu @.> Sent: Monday, November 7, 2022 11:06 AM To: Jindi0/SAFE @.> Cc: Jindi0 @.>; Comment @.> Subject: Re: [Jindi0/SAFE] How to get ‘ ../pf_embedding/case_headline.npy’,'../pf_embedding/case_body.npy','../pf_embedding/case_image.npy' and '../pf_embedding/case_y_fn.npy' (Issue #7)

Forgive me, you recommend me form NumPy array in the shape of p x n x m, where n is the number of words in the article, but the numbers of words in each news are quite different.

For example, the embedding result of news A is a 100x300 ndarray, which means there are 100 words in A, and another result of news B with 60 words is a 60x300 ndarray. How can I combine them into a 3-D Numpy array? I can't apply np.reshape or np.stack. Shall I pad the result of B with zero to make it have the same shape of 100x300 so that I can combine A and B into a 3-D ndarray with the shape of 2x100x300?

Besides, although I can use padding, but the shapes of image.npy which is generated from SIF by using ShowAndTell to convert image to sentence are different too. Some of them even have the shape of 50x300, while 50 is beyond the limit of image_size which is equal to 20 in your code. I can't pad these image array.

------------------ 原始邮件 ------------------ 发件人: "Jindi0/SAFE" @.>; 发送时间: 2022年11月7日(星期一) 晚上10:05 @.>; @.**@.>; 主题: Re: [Jindi0/SAFE] How to get ‘ ../pf_embedding/case_headline.npy’,'../pf_embedding/case_body.npy','../pf_embedding/case_image.npy' and '../pf_embedding/case_y_fn.npy' (Issue #7)

Good morning Yiyuan,

Thanks for your interest in our work! I will answer your questions in this email.

For the question you send to Dr.Zhou: You used the case_image.npy in your code for train.py. But where is the case_image.npy from? Did you get a sentence from NewsImage in the FakeNewsNet dataset using the ShowAndTell model, and take that sentence as input to SIF, then generate case_image.npy?

Because we do not provide the image sentence file, users need to execute ShowAndTell to obtain the image sentence. The sentence extracted from an image should be embedded into an array of size nxm, where n is the number of words in the sentence and m is the size of each word's embedding vector.

For the question in Github: I'm sorry, but article text embedding results in the FakeNewsNet_Dataset_processed in Google Drive you mentioned in embedding branch are still separated. And I try to use np.concatenate to merge the embedding results, but concatenating results is a 2-D ndarray, but the x_head and x_body seem to be 3-D ndarray in your code.

Please form the NumPy array in the shape of p x n x m, where p is the number of news you want to use, n is the number of words in the article, and m is the size of each word's embedding vector. np.reshape could convert your 2D array to 3D with the specified shape.

I will modify the Readme file in Github later to provide more explanation. Thanks!

Sincerely, Jindi


From: Yiyuan Zhu @.> Sent: Monday, November 7, 2022 8:35 AM To: Jindi0/SAFE @.> Cc: Jindi0 @.>; Comment @.> Subject: Re: [Jindi0/SAFE] How to get ‘ ../pf_embedding/case_headline.npy’,'../pf_embedding/case_body.npy','../pf_embedding/case_image.npy' and '../pf_embedding/case_y_fn.npy' (Issue #7)

Sorry for this late reply. After finishing the text embedding, the text of an article will be represented as a NumPy array. We use np.concatenate to merge the embedding results of multiple articles into a huge NumPy array and store it as a npy file. You will find the embedding results in the README file of the embedding branch.

I'm sorry, but article text embedding results in the FakeNewsNet_Dataset_processed in Google Drive you mentioned in embedding branch are still separated. And I try to use np.concatenate to merge the embedding results, but concatenating results is a 2-D ndarray, but the x_head and x_body seem to be 3-D ndarray in your code.

― Reply to this email directly, view it on GitHub<https://github.com/Jindi0/SAFE/issues/7#issuecomment-1305627186&gt;, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AP56TZIN2CGTPWDXL5RTC23WHEATDANCNFSM5GXF3F2A&gt;. You are receiving this because you commented.Message ID: @.***>

― Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

― Reply to this email directly, view it on GitHubhttps://github.com/Jindi0/SAFE/issues/7#issuecomment-1305834873, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AP56TZPQBFL4HMGAQAZVUFDWHESH5ANCNFSM5GXF3F2A. You are receiving this because you commented.Message ID: @.***>

thinking024 commented 1 year ago

Hi, Jindi,

Thank you for your willingness to take the trouble to answer my questions. And I run train.py successfully with your help. But another problem appeared when I tried to run test.py. What does key_list mean in your code and how can I generate case_keys.txt?

I would appreciate it if you could take time out of your busy schedule to answer my questions.

Sincerely, Yiyuan Zhu

------------------ 原始邮件 ------------------ 发件人: "Jindi0/SAFE" @.>; 发送时间: 2022年11月8日(星期二) 凌晨0:19 @.>; @.**@.>; 主题: Re: [Jindi0/SAFE] How to get ‘ ../pf_embedding/case_headline.npy’,'../pf_embedding/case_body.npy','../pf_embedding/case_image.npy' and '../pf_embedding/case_y_fn.npy' (Issue #7)

If you check the README in the embedding branch of the GitHub repository, you will find the number of the kept word is customizable. All the words are kept in our provided files. Please uniform the number of words for news according to your needs, e.g, keep the first 20 words for the headline, keep the first 50 words for the body text, and keep the first 10 words for the image text, etc.


From: Yiyuan Zhu @.> Sent: Monday, November 7, 2022 11:06 AM To: Jindi0/SAFE @.> Cc: Jindi0 @.>; Comment @.> Subject: Re: [Jindi0/SAFE] How to get ‘ ../pf_embedding/case_headline.npy’,'../pf_embedding/case_body.npy','../pf_embedding/case_image.npy' and '../pf_embedding/case_y_fn.npy' (Issue #7)

Forgive me, you recommend me form NumPy array in the shape of p x n x m, where n is the number of words in the article, but the numbers of words in each news are quite different.

For example, the embedding result of news A is a 100x300 ndarray, which means there are 100 words in A, and another result of news B with 60 words is a 60x300 ndarray. How can I combine them into a 3-D Numpy array? I can't apply np.reshape or np.stack. Shall I pad the result of B with zero to make it have the same shape of 100x300 so that I can combine A and B into a 3-D ndarray with the shape of 2x100x300?

Besides, although I can use padding, but the shapes of image.npy which is generated from SIF by using ShowAndTell to convert image to sentence are different too. Some of them even have the shape of 50x300, while 50 is beyond the limit of image_size which is equal to 20 in your code. I can't pad these image array.

------------------&nbsp;原始邮件&nbsp;------------------ 发件人: "Jindi0/SAFE" @.&gt;; 发送时间:&nbsp;2022年11月7日(星期一) 晚上10:05 @.&gt;; @.**@.&gt;; 主题:&nbsp;Re: [Jindi0/SAFE] How to get ‘ ../pf_embedding/case_headline.npy’,'../pf_embedding/case_body.npy','../pf_embedding/case_image.npy' and '../pf_embedding/case_y_fn.npy' (Issue #7)

Good morning Yiyuan,

Thanks for your interest in our work! I will answer your questions in this email.

For the question you send to Dr.Zhou: You used the case_image.npy in your code for train.py. But where is the case_image.npy from? Did you get a sentence from NewsImage in the FakeNewsNet dataset using the ShowAndTell model, and take that sentence as input to SIF, then generate case_image.npy?

Because we do not provide the image sentence file, users need to execute ShowAndTell to obtain the image sentence. The sentence extracted from an image should be embedded into an array of size nxm, where n is the number of words in the sentence and m is the size of each word's embedding vector.

For the question in Github: I'm sorry, but article text embedding results in the FakeNewsNet_Dataset_processed in Google Drive you mentioned in embedding branch are still separated. And I try to use np.concatenate to merge the embedding results, but concatenating results is a 2-D ndarray, but the x_head and x_body seem to be 3-D ndarray in your code.

Please form the NumPy array in the shape of p x n x m, where p is the number of news you want to use, n is the number of words in the article, and m is the size of each word's embedding vector. np.reshape could convert your 2D array to 3D with the specified shape.

I will modify the Readme file in Github later to provide more explanation. Thanks!

Sincerely, Jindi


From: Yiyuan Zhu @.&gt; Sent: Monday, November 7, 2022 8:35 AM To: Jindi0/SAFE @.&gt; Cc: Jindi0 @.&gt;; Comment @.&gt; Subject: Re: [Jindi0/SAFE] How to get ‘ ../pf_embedding/case_headline.npy’,'../pf_embedding/case_body.npy','../pf_embedding/case_image.npy' and '../pf_embedding/case_y_fn.npy' (Issue #7)

Sorry for this late reply. After finishing the text embedding, the text of an article will be represented as a NumPy array. We use np.concatenate to merge the embedding results of multiple articles into a huge NumPy array and store it as a npy file. You will find the embedding results in the README file of the embedding branch.

I'm sorry, but article text embedding results in the FakeNewsNet_Dataset_processed in Google Drive you mentioned in embedding branch are still separated. And I try to use np.concatenate to merge the embedding results, but concatenating results is a 2-D ndarray, but the x_head and x_body seem to be 3-D ndarray in your code.

― Reply to this email directly, view it on GitHub<https://github.com/Jindi0/SAFE/issues/7#issuecomment-1305627186&amp;gt;, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AP56TZIN2CGTPWDXL5RTC23WHEATDANCNFSM5GXF3F2A&amp;gt;. You are receiving this because you commented.Message ID: @.***&gt;

― Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***&gt;

― Reply to this email directly, view it on GitHub<https://github.com/Jindi0/SAFE/issues/7#issuecomment-1305834873&gt;, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AP56TZPQBFL4HMGAQAZVUFDWHESH5ANCNFSM5GXF3F2A&gt;. You are receiving this because you commented.Message ID: @.***>

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

balabalacc commented 1 year ago

I would like to ask how case_y_fn.npy in train.py comes from and how it is generated?

123x12 commented 1 month ago

嗨,Jindi,感谢您愿意不辞辛劳回答我的问题。在您的帮助下,我成功运行了 train.py。但是当我尝试运行 test.py 时出现了另一个问题。您的代码中的 key_list 是什么意思?我该如何生成 case_keys.txt?如果您能抽出时间回答我的问题,我将不胜感激。真诚的,Yiyuan Zhu ------------------ 原始邮件 ------------------ 发件人: "Jindi0/SAFE" @.>; 发送时间: 2022年11月8日(星期二) 凌晨0:19 @.>; @.**@.>; 主题: Re: [Jindi0/SAFE] How to get ‘ ../pf_embedding/case_headline.npy’,'../pf_embedding/case_body.npy','../pf_embedding/case_image.npy' and '../pf_embedding/case_y_fn.npy' (Issue #7) If you check the README in the embedding branch of the GitHub repository, you will find the number of the kept word is customizable. All the words are kept in our provided files. Please uniform the number of words for news according to your needs, e.g, keep the first 20 words for the headline, keep the first 50 words for the body text, and keep the first 10 words for the image text, etc. ____ From: Yiyuan Zhu @.> Sent: Monday, November 7, 2022 11:06 AM To: Jindi0/SAFE @.> Cc: Jindi0 @.>; Comment @.> Subject: Re: [Jindi0/SAFE] How to get ‘ ../pf_embedding/case_headline.npy’,'../pf_embedding/case_body.npy','../pf_embedding/case_image.npy' and '../pf_embedding/case_y_fn.npy' (Issue #7) Forgive me, you recommend me form NumPy array in the shape of p x n x m, where n is the number of words in the article, but the numbers of words in each news are quite different. For example, the embedding result of news A is a 100x300 ndarray, which means there are 100 words in A, and another result of news B with 60 words is a 60x300 ndarray. How can I combine them into a 3-D Numpy array? I can't apply np.reshape or np.stack. Shall I pad the result of B with zero to make it have the same shape of 100x300 so that I can combine A and B into a 3-D ndarray with the shape of 2x100x300? Besides, although I can use padding, but the shapes of image.npy which is generated from SIF by using ShowAndTell to convert image to sentence are different too. Some of them even have the shape of 50x300, while 50 is beyond the limit of image_size which is equal to 20 in your code. I can't pad these image array. ------------------&nbsp;原始邮件&nbsp;------------------ 发件人: "Jindi0/SAFE" @.&gt;; 发送时间:&nbsp;2022年11月7日(星期一) 晚上10:05 @.&gt;; @.**@.&gt;; 主题:&nbsp;Re: [Jindi0/SAFE] How to get ‘ ../pf_embedding/case_headline.npy’,'../pf_embedding/case_body.npy','../pf_embedding/case_image.npy' and '../pf_embedding/case_y_fn.npy' (Issue #7) Good morning Yiyuan, Thanks for your interest in our work! I will answer your questions in this email. For the question you send to Dr.Zhou: You used the case_image.npy in your code for train.py. But where is the case_image.npy from? Did you get a sentence from NewsImage in the FakeNewsNet dataset using the ShowAndTell model, and take that sentence as input to SIF, then generate case_image.npy? Because we do not provide the image sentence file, users need to execute ShowAndTell to obtain the image sentence. The sentence extracted from an image should be embedded into an array of size nxm, where n is the number of words in the sentence and m is the size of each word's embedding vector. For the question in Github: I'm sorry, but article text embedding results in the FakeNewsNet_Dataset_processed in Google Drive you mentioned in embedding branch are still separated. And I try to use np.concatenate to merge the embedding results, but concatenating results is a 2-D ndarray, but the x_head and x_body seem to be 3-D ndarray in your code. Please form the NumPy array in the shape of p x n x m, where p is the number of news you want to use, n is the number of words in the article, and m is the size of each word's embedding vector. np.reshape could convert your 2D array to 3D with the specified shape. I will modify the Readme file in Github later to provide more explanation. Thanks! Sincerely, Jindi ____ 来自:Yiyuan Zhu @.> 发送时间:2022 年 11 月 7 日星期一 上午 8:35 收件人:Jindi0/SAFE @.> 抄送:Jindi0 @.>; 评论 @.> 主题:回复:[Jindi0/SAFE] 如何获取 ' ../pf_embedding/case_headline.npy','../pf_embedding/case_body.npy','../pf_embedding/case_image.npy' 和 '../pf_embedding/case_y_fn.npy' (问题#7 ) 抱歉回复晚了。完成文本嵌入后,文章的文本将表示为 NumPy 数组。我们使用 np.concatenate 将多篇文章的嵌入结果合并成一个巨大的 NumPy 数组并将其存储为 npy 文件。您会在 embedding 分支的 README 文件中找到嵌入结果。很抱歉,您在 embedding 分支中提到的 Google Drive 中的 FakeNewsNet_Dataset_processed 中的文章文本嵌入结果仍然是分开的。而且我尝试使用 np.concatenate 合并嵌入结果,但连接结果是二维 ndarray,而 x_head 和 x_body 在您的代码中似乎是三维 ndarray。―直接回复此邮件、在 GitHub 上查看< #7 (评论) >,或取消订阅< https://github.com/notifications/unsubscribe-auth/AP56TZIN2CGTPWDXL5RTC23WHEATDANCNFSM5GXF3F2A&gt ;。您收到此邮件是因为您发表了评论。消息 ID:@.> ― 直接回复此电子邮件,在 GitHub 上查看,或取消订阅。您收到此邮件是因为您发表了评论。消息 ID:@.> ― 直接回复此电子邮件,在 GitHub 上查看< #7 (评论) >,或取消订阅< https://github.com/notifications/unsubscribe-auth/AP56TZPQBFL4HMGAQAZVUFDWHESH5ANCNFSM5GXF3F2A> ;。您收到此邮件是因为您发表了评论。消息 ID:@.> — 直接回复此电子邮件,在 GitHub 上查看,或取消订阅。您收到此邮件是因为您发表了评论。消息 ID:@.>

你好,我想问一下train.py里面的case_y_fn.npy是怎么来的,如何生成的?麻烦您。

thinking024 commented 1 month ago

样本的真实标签0、1,二维的tensor 

思绪万千 @.***

 

------------------ 原始邮件 ------------------ 发件人: "Lu @.>; 发送时间: 2024年9月20日(星期五) 下午5:26 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [Jindi0/SAFE] How to get ‘ ../pf_embedding/case_headline.npy’,'../pf_embedding/case_body.npy','../pf_embedding/case_image.npy' and '../pf_embedding/case_y_fn.npy' (Issue #7)

嗨,Jindi,感谢您愿意不辞辛劳回答我的问题。在您的帮助下,我成功运行了 train.py。但是当我尝试运行 test.py 时出现了另一个问题。您的代码中的 key_list 是什么意思?我该如何生成 case_keys.txt?如果您能抽出时间回答我的问题,我将不胜感激。真诚的,Yiyuan Zhu … ------------------ 原始邮件 ------------------ 发件人: "Jindi0/SAFE" @.>; 发送时间: 2022年11月8日(星期二) 凌晨0:19 @.>; @.@.>; 主题: Re: [Jindi0/SAFE] How to get ‘ ../pf_embedding/case_headline.npy’,'../pf_embedding/case_body.npy','../pf_embedding/case_image.npy' and '../pf_embedding/case_y_fn.npy' (Issue #7) If you check the README in the embedding branch of the GitHub repository, you will find the number of the kept word is customizable. All the words are kept in our provided files. Please uniform the number of words for news according to your needs, e.g, keep the first 20 words for the headline, keep the first 50 words for the body text, and keep the first 10 words for the image text, etc. ____ From: Yiyuan Zhu @.> Sent: Monday, November 7, 2022 11:06 AM To: Jindi0/SAFE @.> Cc: Jindi0 @.>; Comment @.> Subject: Re: [Jindi0/SAFE] How to get ‘ ../pf_embedding/case_headline.npy’,'../pf_embedding/case_body.npy','../pf_embedding/case_image.npy' and '../pf_embedding/case_y_fn.npy' (Issue #7) Forgive me, you recommend me form NumPy array in the shape of p x n x m, where n is the number of words in the article, but the numbers of words in each news are quite different. For example, the embedding result of news A is a 100x300 ndarray, which means there are 100 words in A, and another result of news B with 60 words is a 60x300 ndarray. How can I combine them into a 3-D Numpy array? I can't apply np.reshape or np.stack. Shall I pad the result of B with zero to make it have the same shape of 100x300 so that I can combine A and B into a 3-D ndarray with the shape of 2x100x300? Besides, although I can use padding, but the shapes of image.npy which is generated from SIF by using ShowAndTell to convert image to sentence are different too. Some of them even have the shape of 50x300, while 50 is beyond the limit of image_size which is equal to 20 in your code. I can't pad these image array. ------------------&nbsp;原始邮件&nbsp;------------------ 发件人: "Jindi0/SAFE" @.&gt;; 发送时间:&nbsp;2022年11月7日(星期一) 晚上10:05 @.&gt;; @.@.&gt;; 主题:&nbsp;Re: [Jindi0/SAFE] How to get ‘ ../pf_embedding/case_headline.npy’,'../pf_embedding/case_body.npy','../pf_embedding/case_image.npy' and '../pf_embedding/case_y_fn.npy' (Issue #7) Good morning Yiyuan, Thanks for your interest in our work! I will answer your questions in this email. For the question you send to Dr.Zhou: You used the case_image.npy in your code for train.py. But where is the case_image.npy from? Did you get a sentence from NewsImage in the FakeNewsNet dataset using the ShowAndTell model, and take that sentence as input to SIF, then generate case_image.npy? Because we do not provide the image sentence file, users need to execute ShowAndTell to obtain the image sentence. The sentence extracted from an image should be embedded into an array of size nxm, where n is the number of words in the sentence and m is the size of each word's embedding vector. For the question in Github: I'm sorry, but article text embedding results in the FakeNewsNet_Dataset_processed in Google Drive you mentioned in embedding branch are still separated. And I try to use np.concatenate to merge the embedding results, but concatenating results is a 2-D ndarray, but the x_head and x_body seem to be 3-D ndarray in your code. Please form the NumPy array in the shape of p x n x m, where p is the number of news you want to use, n is the number of words in the article, and m is the size of each word's embedding vector. np.reshape could convert your 2D array to 3D with the specified shape. I will modify the Readme file in Github later to provide more explanation. Thanks! Sincerely, Jindi ____ 来自:Yiyuan Zhu @.> 发送时间:2022 年 11 月 7 日星期一 上午 8:35 收件人:Jindi0/SAFE @.> 抄送:Jindi0 @.>; 评论 @.> 主题:回复:[Jindi0/SAFE] 如何获取 ' ../pf_embedding/case_headline.npy','../pf_embedding/case_body.npy','../pf_embedding/case_image.npy' 和 '../pf_embedding/case_y_fn.npy' (问题#7 ) 抱歉回复晚了。完成文本嵌入后,文章的文本将表示为 NumPy 数组。我们使用 np.concatenate 将多篇文章的嵌入结果合并成一个巨大的 NumPy 数组并将其存储为 npy 文件。您会在 embedding 分支的 README 文件中找到嵌入结果。很抱歉,您在 embedding 分支中提到的 Google Drive 中的 FakeNewsNet_Dataset_processed 中的文章文本嵌入结果仍然是分开的。而且我尝试使用 np.concatenate 合并嵌入结果,但连接结果是二维 ndarray,而 x_head 和 x_body 在您的代码中似乎是三维 ndarray。―直接回复此邮件、在 GitHub 上查看< #7 (评论) >,或取消订阅< https://github.com/notifications/unsubscribe-auth/AP56TZIN2CGTPWDXL5RTC23WHEATDANCNFSM5GXF3F2A&amp;gt ;。您收到此邮件是因为您发表了评论。消息 @.> ― 直接回复此电子邮件,在 GitHub 上查看,或取消订阅。您收到此邮件是因为您发表了评论。消息 @.> ― 直接回复此电子邮件,在 GitHub 上查看< #7 (评论) >,或取消订阅< https://github.com/notifications/unsubscribe-auth/AP56TZPQBFL4HMGAQAZVUFDWHESH5ANCNFSM5GXF3F2A&gt; ;。您收到此邮件是因为您发表了评论。消息 @.> — 直接回复此电子邮件,在 GitHub 上查看,或取消订阅。您收到此邮件是因为您发表了评论。消息 @.>

你好,我想问一下train.py里面的case_y_fn.npy是怎么来的,如何生成的?麻烦您。

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>