facebookresearch / InferSent

InferSent sentence embeddings
Other
2.28k stars 470 forks source link

Improve data security and fix for macos #18

Closed karlinnolabs closed 7 years ago

karlinnolabs commented 7 years ago

I should probably have added a check to see if 7za is installed (brew install p7zip) might do that later, at the minute I'm busy and can't contribute as much as I'd like.

facebook-github-bot commented 7 years ago

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign up at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need the corporate CLA signed.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

facebook-github-bot commented 7 years ago

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

aconneau commented 7 years ago

This is too verbose and makes it look unnecessary complicated for a single script that's simply downloading some text data.

"If the user is on macOS, download p7zip and use it instead of unzip." That's the only thing we should handle.

The checksum is an overkill for this simple script and may bring some additional bugs for other OS.

karlinnolabs commented 7 years ago

Given the size of the files, you may think my changes were overly verbose but it was prone to errors - doing a checksum when pulling data from the internet is a fairly standard data security & integrity, also checking the result of unzip before deleting the file is important because in my case when it failed it removed a file which took an hour to download.

aconneau commented 7 years ago

You are right. Though we are downloading well-known datasets from Stanford's website. The security&integrity issue here is not vital. The simpler the script, the better. Let's just fix the unzip macOS issue, and put an if statement (if the unzipped .txt file exists) before the "rm" with an error message.