Open jordane95 opened 1 year ago
btw, during unzipping, i find some errors in log
Archive: tart_dual_train_data.zip
warning [tart_dual_train_data.zip]: 12884901888 extra bytes at beginning or within zipfile
(attempting to process anyway)
file #1: bad zipfile offset (local header sig): 12884901888
(attempting to re-compensate)
inflating: minilm_denoised_T0_32_datasets_fixed_instruction_unfollowing_fixed_train.jsonl
error: invalid compressed data to inflate
So I don't know if I have downloaed it correctly from google drive.
Another problem, I wanted to regroup the data into each source by finding the instructions in question field and leverage this file for reverse mapping. But I find some instructions are not included in the file.
For example, one instruction I met was
Given questions asked in StackExchange, a community-powered Q&A sites, retrieve a duplicated question body asking the same as this question.
But no such instruction in berri_instructions.tsv...
btw, during unzipping, i find some errors in log
Archive: tart_dual_train_data.zip warning [tart_dual_train_data.zip]: 12884901888 extra bytes at beginning or within zipfile (attempting to process anyway) file #1: bad zipfile offset (local header sig): 12884901888 (attempting to re-compensate) inflating: minilm_denoised_T0_32_datasets_fixed_instruction_unfollowing_fixed_train.jsonl error: invalid compressed data to inflate
So I don't know if I have downloaed it correctly from google drive.
Hello! I meet the same problem as you. Have you solved this problems now?
@AkariAsai Could you help us solve this problem? Thanks a lot!~
btw, during unzipping, i find some errors in log
Archive: tart_dual_train_data.zip warning [tart_dual_train_data.zip]: 12884901888 extra bytes at beginning or within zipfile (attempting to process anyway) file #1: bad zipfile offset (local header sig): 12884901888 (attempting to re-compensate) inflating: minilm_denoised_T0_32_datasets_fixed_instruction_unfollowing_fixed_train.jsonl error: invalid compressed data to inflate
So I don't know if I have downloaed it correctly from google drive.
Hello! I meet the same problem as you. Have you solved this problems now?
Yes. I think it might be the network issue during downloading. I re-download it and everthing works well.
Thanks!I will try again
------------------ Original ------------------ From: Zehan Li @.> Date: Fri, Oct 13, 2023 10:59 PM To: facebookresearch/tart @.> Cc: ImmortalCi @.>, Comment @.> Subject: Re: [facebookresearch/tart] tart-dual training data (Issue #8)
btw, during unzipping, i find some errors in log
Archive: tart_dual_train_data.zip warning [tart_dual_train_data.zip]: 12884901888 extra bytes at beginning or within zipfile (attempting to process anyway) file #1: bad zipfile offset (local header sig): 12884901888 (attempting to re-compensate) inflating: minilm_denoised_T0_32_datasets_fixed_instruction_unfollowing_fixed_train.jsonl error: invalid compressed data to inflate
So I don't know if I have downloaed it correctly from google drive.
Hello! I meet the same problem as you. Have you solved this problems now?
Yes. I think it might be the network issue during downloading. I re-download it and everthing works well.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
btw, during unzipping, i find some errors in log
Archive: tart_dual_train_data.zip warning [tart_dual_train_data.zip]: 12884901888 extra bytes at beginning or within zipfile (attempting to process anyway) file #1: bad zipfile offset (local header sig): 12884901888 (attempting to re-compensate) inflating: minilm_denoised_T0_32_datasets_fixed_instruction_unfollowing_fixed_train.jsonl error: invalid compressed data to inflate
So I don't know if I have downloaed it correctly from google drive.
Hello! I meet the same problem as you. Have you solved this problems now?
Yes. I think it might be the network issue during downloading. I re-download it and everthing works well.
Hi! I have retried to download the tart_dual_train_data.zip. But during the process of unzipping it. Error occured again. Sorry to bother but, could you send the ZIP file which you have unzipped successfully to me during your spare time?
I will appreciate so much if it's OK.
Thanks a lot !~
btw, during unzipping, i find some errors in log
Archive: tart_dual_train_data.zip warning [tart_dual_train_data.zip]: 12884901888 extra bytes at beginning or within zipfile (attempting to process anyway) file #1: bad zipfile offset (local header sig): 12884901888 (attempting to re-compensate) inflating: minilm_denoised_T0_32_datasets_fixed_instruction_unfollowing_fixed_train.jsonl error: invalid compressed data to inflate
So I don't know if I have downloaed it correctly from google drive.
Hello! I meet the same problem as you. Have you solved this problems now?
Yes. I think it might be the network issue during downloading. I re-download it and everthing works well.
Hi! I have retried to download the tart_dual_train_data.zip. But during the process of unzipping it. Error occured again. Sorry to bother but, could you send the ZIP file which you have unzipped successfully to me during your spare time?
I will appreciate so much if it's OK.
Thanks a lot !~
Hello @ImmortalCi , I came across this issue and I managed to unzip it correctly using the java jar command specifically,
jar -xvf <zip file>
Thanks for your reply! I will take a try~
------------------ Original ------------------ From: Omar Khaled Abdelhakim @.> Date: Tue,Jan 23,2024 8:51 AM To: facebookresearch/tart @.> Cc: ImmortalCi @.>, Mention @.> Subject: Re: [facebookresearch/tart] tart-dual training data (Issue #8)
btw, during unzipping, i find some errors in log
Archive: tart_dual_train_data.zip warning [tart_dual_train_data.zip]: 12884901888 extra bytes at beginning or within zipfile (attempting to process anyway) file #1: bad zipfile offset (local header sig): 12884901888 (attempting to re-compensate) inflating: minilm_denoised_T0_32_datasets_fixed_instruction_unfollowing_fixed_train.jsonl error: invalid compressed data to inflate
So I don't know if I have downloaed it correctly from google drive.
Hello! I meet the same problem as you. Have you solved this problems now?
Yes. I think it might be the network issue during downloading. I re-download it and everthing works well.
Hi! I have retried to download the tart_dual_train_data.zip. But during the process of unzipping it. Error occured again. Sorry to bother but, could you send the ZIP file which you have unzipped successfully to me during your spare time?
I will appreciate so much if it's OK.
Thanks a lot !~
Hello @ImmortalCi ,
I came across this issue and I managed to unzip it correctly using the java jar command specifically,
jar -xvf <zip file>
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: @.***>
Hi, very interesting work on bring instructions into retrieval!
I want to use the tart-dual data for replication. After downloaded and unzipped, I only find one file named
minilm_denoised_T0_32_datasets_fixed_instruction_unfollowing_fixed_train.jsonl
.Is this the exact data used to fine-tune the tart-dual model in your paper? Because I find this file only contains 1M lines yet the paper stated that BERRI has 5M instances from 37 datasets.