commonsense / conceptnet-numberbatch

Other
1.28k stars 143 forks source link

Cannot get annex files #33

Closed alphamupsiomega closed 8 years ago

alphamupsiomega commented 8 years ago

After everything else is successful, including a Git Annex installation, when I type:

cd code/data
git annex get

Nothing occurs. No downloads start. I tried this with cd code/source-data, and it does not work either. How can I download the data files?

Alternatively, is there a way to obtain the files without git annex?

rspeer commented 8 years ago

I checked the instructions. The cd code/data part seems spurious, as that's not the name of any directory involved; sorry about that. But I just ran git annex get in the root of a fresh checkout and it worked. Could you try it there?

I'm confused that you get no output from git annex get. Does git annex init do anything?

There isn't really an alternative to git-annex at the moment; I wanted to make sure to version-control the files so that someone could get a precise version of the data that goes with a precise version of the code.

But I have concluded that git-annex is a pain, especially for public consumption, so you might be happy to know that I've avoided git-annex and used a much more straightforward downloading scheme as I work on merging this code into the next version of the ConceptNet code.

alphamupsiomega commented 8 years ago
(python3env)MYMBP:source-data MY$ git init
Reinitialized existing Git repository in /Users/MY/conceptnet-numberbatch/code/source-data/.git/
(python3env)MYMBP:source-data MY$ git annex init
init  ok
(recording state in git...)
(python3env)MYMBP:source-data MY$ git annex get
(python3env)MYMBP:source-data MY$ 

I've tried this in other directories, and there is no difference. I assume you have the actual data files somewhere--can you find another way to share these files?

rspeer commented 8 years ago

I'll probably be working on that, but I'd like to understand what went wrong with the instructions here, for the benefit of others.

Why is /Users/MY/conceptnet-numberbatch/code/source-data/.git/ its own git repository? Did something in the instructions lead to that, or did you previously run git init there as well? That would presumably be the problem. Your source-data directory is an empty repository, and that's why git-annex has nothing to get.

The git repository you should be using is the one in the conceptnet-numberbatch directory. There shouldn't be any sub-repositories involved. I recommend removing the source-data/.git directory, not running git init (that's way different from git annex init), and trying again.

alphamupsiomega commented 8 years ago

I did that as well as there is no difference.

(python3env)MYMBP:conceptnet-numberbatch MY$ git annex get
git-annex: Not in a git repository.
(python3env)MYMBP:conceptnet-numberbatch MY$ git init
Initialized empty Git repository in /Users/MY/xNLP/conceptnet-numberbatch/.git/
(python3env)MYMBP:conceptnet-numberbatch MY$ git annex get
git-annex: First run: git-annex init
(python3env)MYMBP:conceptnet-numberbatch MY$ git-annex init
init  ok
(recording state in git...)
(python3env)MYMBP:conceptnet-numberbatch MY$ git annex get
(python3env)MYMBP:conceptnet-numberbatch MY$ 
rspeer commented 8 years ago

You should stop running git init. It's creating empty Git repositories.

You didn't tell me that you got "git-annex: Not in a git repository" before. This is what you need to fix. You need to be in a git repository. Not an empty one that you just made. You need to be in the conceptnet-numberbatch repository, the one that you presumably cloned at some point and that you're reporting an issue on.

rspeer commented 8 years ago

Meanwhile, I've run the current directions from scratch and confirmed that they work (all I had to fix was the thing about code/data). I think you've messed up your repository by running extra commands, but if you start over, it should work.

alphamupsiomega commented 8 years ago

How do I get "in" a git repository with git annex? I've not used git annex before, and its walkthrough does not explain differences between git init and git annex init. The walkthrough just runs each one one after the other so it's not clear what each does. Would you mind please writing out the exact code line by line I need to type? Per your instructions I've already deleted the empty Git repositories.

rspeer commented 8 years ago

git-annex is a tool that manages large files in a Git repository. You use it as part of a Git repository. There isn't a separate idea of a "git annex repository", it's just extra data in a Git repository. conceptnet-numberbatch has this extra data, as a way of having large, version-controlled data files.

git init creates a new Git repository. git annex init sets up an existing Git repository to use git-annex if it's not already. You saw these in the git-annex walkthrough because it's walking you through how to start a new project. My directions told you to read the walkthrough, because git-annex is confusing, and thanks for reading it, but if you follow the walkthrough's directions you're going to make a new project. You should follow Numberbatch's directions, not the walkthrough's directions, to get Numberbatch.

Here's what you'll need to run. First clone this repository and go to its directory:

git clone https://github.com/LuminosoInsight/conceptnet-numberbatch
cd conceptnet-numberbatch

Then run the directions in the README:

cd code
python setup.py develop
git annex get
cd ..
python ninja.py
ninja
alphamupsiomega commented 8 years ago

Got it, git annex get works now. Thanks for your help. Since the cd/data directory did not work initially, I thought git annex had to initialized separate from your instructions.