PathwayCommons / factoid

A project to capture biological pathway data from academic papers
https://biofactoid.org
MIT License
28 stars 7 forks source link

Neo4j: Update READ.me #1174

Closed lindajiawenli closed 1 year ago

lindajiawenli commented 1 year ago

Update READ.me to include instructions/specifications so that a user can

  1. Run the Neo4j test suite
  2. Run Neo4j

This means:

We can assume that the user knows how to use factoid

Q: Are we keeping the docker container?

jvwong commented 1 year ago

Using Docker is a choice that can be made later - I think the versions and dependencies are the key thing.

lindajiawenli commented 1 year ago

Notes:

Neo4j version: 5.4.0, Community edition APOC version: 5.4.1

Found using the following Cypher queries

call dbms.components() yield name, versions, edition unwind versions as version return name, version, edition;
RETURN apoc.version() AS output;
jvwong commented 1 year ago

Great!

Update: As expected, 296fce1 running on https://unstable.factoid.baderlab.org/ does in fact

jvwong commented 1 year ago

Update: I've got a Dockerized neo4j instance running and hooked up to the factoid app. It successfully populates the graphdb on boot and accepts new docs on submit.

If you find a way to access the Donnelly LAN you can access the Neo4j browser at: 192.168.81.174:7474

lindajiawenli commented 1 year ago

Unfortunately it looks like I'd need VPN access for that, but I'm glad to hear the update! Screen Shot 2023-04-14 at 1 14 07 PM

lindajiawenli commented 1 year ago

I have found 2 download options for APOC:

I believe the second link is the one we need, but I'm not 100% sure (also these are two completely different repos which confuses me). The first link is what's provided in the documentation (https://neo4j.com/labs/apoc/5/installation/#neo4j-server)

As for Neo4j 5.4.0, I can't seem to find the download option for that version: https://neo4j.com/download-center/#community (an installation guide can be found here: https://neo4j.com/docs/operations-manual/current/installation/)

jvwong commented 1 year ago

https://neo4j.com/docs/operations-manual/current/installation/

Its probably sufficient to state that a requirement is APOC, with some version that worked.

lindajiawenli commented 1 year ago

Current Steps followed on 2016 MacBook Air to get Neo4j running without Docker:

  1. Install Java 17 from https://www.oracle.com/java/technologies/downloads/
  2. Download Neo4j 5.6.0 Community Edition from https://neo4j.com/download-center/#community
  3. Open terminal, run tar -xf <path to neo4j-community-5.6.0-unix.tar>
  4. cd ~/neo4j-community-5.6.0/conf
  5. vim neo4j.conf
  6. Uncomment dbms.security.auth_enabled=false
  7. Download apoc 5.6.0 jar file here: https://github.com/neo4j/apoc/releases/
  8. mv ~/Downloads/apoc-5.6.0-core.jar ~/neo4j-community-5.6.0/plugins
  9. cd ~/neo4j-community5.6.0/
  10. To start Neo4j with APOC: ./bin/neo4j-admin server console
  11. Go to http://localhost:7474/browser/
  12. Ctrl-C to stop Neo4j server

Errors when neo4j-test suite is run:

Screen Shot 2023-04-17 at 5 03 09 PM

lindajiawenli commented 1 year ago

UPDATE: All tests pass when the following lines are uncommented in the config file (All are under the subheading Network connector configuration)

server.default_advertised_address=localhost
server.default_listen_address=0.0.0.0

server.bolt.enabled=true
server.bolt.tls_level=DISABLED
server.bolt.listen_address=:7687
server.bolt.advertised_address=:7687

server.http.enabled=true
server.http.listen_address=:7474
server.http.advertised_address=:7474

To edit the Neo4j config file (assuming the steps in the previous comment were followed)

  1. Open a Terminal window
  2. cd ~/neo4j-community-5.6.0/conf
  3. vim neo4j.conf

All the config lines shown above should be there, but are commented out by default

lindajiawenli commented 1 year ago

UPDATE: when I run $ LOG_LEVEL="debug" CRON_SCHEDULE="* * * * *" GRAPHDB_CRON_REFRESH_PERIOD_MINUTES="5" npm run watch, I do not see the 300+ nodes in my Neo4j Browser Window. I have been fiddling with it for a while now

lindajiawenli commented 1 year ago

Comparison of what gets printed when Docker Neo4j container starts vs. non-Docker/local computer Neo4j:

Docker:

2023-03-23 09:18:57 2023-03-23 13:18:57.719+0000 INFO  ======== Neo4j 5.4.0 ========
2023-03-23 09:19:08 2023-03-23 13:19:08.300+0000 INFO  Bolt enabled on 0.0.0.0:7687.
2023-03-23 09:19:11 2023-03-23 13:19:11.656+0000 INFO  Remote interface available at http://localhost:7474/
2023-03-23 09:19:11 2023-03-23 13:19:11.675+0000 INFO  id: D7E48376E091F97139B52102A47C2A9B672D993BFA4F995E7D42A5C77040D9E7
2023-03-23 09:19:11 2023-03-23 13:19:11.676+0000 INFO  name: system
2023-03-23 09:19:11 2023-03-23 13:19:11.677+0000 INFO  creationDate: 2023-02-09T18:46:13.058Z
2023-03-23 09:19:11 2023-03-23 13:19:11.677+0000 INFO  Started.

Non-Docker:

2023-04-18 12:44:18.260+0000 INFO  ======== Neo4j 5.6.0 ========
2023-04-18 12:44:24.326+0000 INFO  Bolt enabled on 0.0.0.0:7687.
2023-04-18 12:44:25.312+0000 INFO  Remote interface available at http://localhost:7474/
2023-04-18 12:44:25.316+0000 INFO  id: E52E0E9B1552639B2860BD87EF7FB5352CEF9E66B439F81108E590D8C4E596AA
2023-04-18 12:44:25.317+0000 INFO  name: system
2023-04-18 12:44:25.317+0000 INFO  creationDate: 2023-04-17T18:10:38.286Z
2023-04-18 12:44:25.318+0000 INFO  Started.

Note: Still haven't been able to find Neo4j version 5.4.0 so I installed 5.6.0 instead. Hopefully the version isn't what's tripping me up. I don't think it is because the tests still run fine

lindajiawenli commented 1 year ago

UPDATE: Forgot to say, a document can successfully be added to Neo4j upon it being submitted.

Right now, the only issue seems to be the cron refresh/dumping all the documents into Neo4j

lindajiawenli commented 1 year ago

@jvwong Do you think the Cron refreshing needs some kind of special configuration? I still haven't figured it out. I'm a little surprised that the neo4j database updates/adds nodes and edges when a new document is submitted but doesn't do the refreshing

jvwong commented 1 year ago

@jvwong Do you think the Cron refreshing needs some kind of special configuration? I still haven't figured it out. I'm a little surprised that the neo4j database updates/adds nodes and edges when a new document is submitted but doesn't do the refreshing

It should populate the neo4j from factoid database on start. Does it do that in your hands?

lindajiawenli commented 1 year ago

It should populate the neo4j from factoid database on start. Does it do that in your hands?

The "factoid" rethinkdb database, right? I was thinking about that (since they're connected in the docker-compose.yml file). No, I didn't do that! That might be it. I'll try to look up how to do it.

As a side note, I switched back to using the docker container and I noticed that the refresh isn't working there- that could be because I've fiddled with a lot on my local computer though. Could you just confirm for me if it's still working on your end?

lindajiawenli commented 1 year ago

I've been seeing this a lot recently (using Docker and the Non-docker versions of Neo4j)

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0x110b9f4b5 node::Abort() (.cold.1) [/Users/jiawen/.nvm/versions/node/v18.13.0/bin/node]
 2: 0x10f651219 node::Abort() [/Users/jiawen/.nvm/versions/node/v18.13.0/bin/node]
 3: 0x10f6513fe node::OOMErrorHandler(char const*, bool) [/Users/jiawen/.nvm/versions/node/v18.13.0/bin/node]
 4: 0x10f7dab23 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/Users/jiawen/.nvm/versions/node/v18.13.0/bin/node]
 5: 0x10f9a37b5 v8::internal::Heap::FatalProcessOutOfMemory(char const*) [/Users/jiawen/.nvm/versions/node/v18.13.0/bin/node]
 6: 0x10f9a7c80 v8::internal::Heap::CollectSharedGarbage(v8::internal::GarbageCollectionReason) [/Users/jiawen/.nvm/versions/node/v18.13.0/bin/node]
 7: 0x10f9a44cf v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::internal::GarbageCollectionReason, char const*, v8::GCCallbackFlags) [/Users/jiawen/.nvm/versions/node/v18.13.0/bin/node]
 8: 0x10f9a15a8 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/Users/jiawen/.nvm/versions/node/v18.13.0/bin/node]
 9: 0x10fa39fa1 v8::internal::ScavengeJob::Task::RunInternal() [/Users/jiawen/.nvm/versions/node/v18.13.0/bin/node]
10: 0x10f6be4ce node::PerIsolatePlatformData::RunForegroundTask(std::__1::unique_ptr<v8::Task, std::__1::default_delete<v8::Task> >) [/Users/jiawen/.nvm/versions/node/v18.13.0/bin/node]
11: 0x10f6bcf17 node::PerIsolatePlatformData::FlushForegroundTasksInternal() [/Users/jiawen/.nvm/versions/node/v18.13.0/bin/node]
12: 0x110105cbb uv__async_io [/Users/jiawen/.nvm/versions/node/v18.13.0/bin/node]
13: 0x11011a75a uv__io_poll [/Users/jiawen/.nvm/versions/node/v18.13.0/bin/node]
14: 0x110106288 uv_run [/Users/jiawen/.nvm/versions/node/v18.13.0/bin/node]
15: 0x10f586773 node::SpinEventLoop(node::Environment*) [/Users/jiawen/.nvm/versions/node/v18.13.0/bin/node]
16: 0x10f695a96 node::NodeMainInstance::Run() [/Users/jiawen/.nvm/versions/node/v18.13.0/bin/node]
17: 0x10f61711c node::LoadSnapshotDataAndRun(node::SnapshotData const**, node::InitializationResult const*) [/Users/jiawen/.nvm/versions/node/v18.13.0/bin/node]
18: 0x10f6173c3 node::Start(int, char**) [/Users/jiawen/.nvm/versions/node/v18.13.0/bin/node]
19: 0x7fff207c4f3d start [/usr/lib/system/libdyld.dylib]
20: 0x4 
[nodemon] app crashed - waiting for file changes before starting...
maxkfranz commented 1 year ago

If you're loading all the docs into memory and then adding more memory per doc, you may use a fair bit at once. The default heap size is small. Normally you stream/chunk in node, so you usually don't need much memory.

You might either chunk your processing or just increase the heap size. If the import all case should only ever be used once, the simplest thing may be to just increase the heap size.

https://nodejs.org/api/cli.html#--max-old-space-sizesize-in-megabytes

lindajiawenli commented 1 year ago

Cron refresh and everything else now works perfectly after a git pull Memory error has not reappeared

Note: You can see all the nodes by clicking the settings gear in the bottom left corner of Neo4j Browser and changing "Initial Node Display" to 700

Screen Shot 2023-04-19 at 12 34 38 PM