NextCenturyCorporation / AIDA-Interchange-Format

Java/python library and validator for the AIDA Interchange Format (AIF). Originally based on isi-vista/gaia-interchange.
MIT License
21 stars 11 forks source link

Output filesize varied greatly with FULL and SHATTER mode #15

Closed fatestigma closed 5 years ago

fatestigma commented 6 years ago

With the latest build(6940c5a), using coldstart2AidaInterchange and the same parameters except the mode, the size of output file varied greatly. For SHATTER mode, it outputs 151 turtle files, and ~4.3GB in total. And for FULL mode, it only outputs one file with ~56MB. Is that normal?

My input parameters for SHATTER mode:

inputKBFile: /path/to/eng.cs
baseURI: http://www.isi.edu/aida
systemURI: http://www.rpi.edu/tinkerbell
mode: SHATTER
ontology: rpi_seedling
outputAIFDirectory: /path/to/output

and part of parameters for FULL mode

mode: FULL
outputAIFFile: /path/to/output.nt
gabbard commented 6 years ago

@fatestigma : Actually, with just 151 files, 56 MB of output sounds reasonable. It's more surprising that the blow up form turtle format is so much that we end up at 4.3 GB. I'll look into it, but in the meantime you can be reasonably confident that the FULL output is correct

fatestigma commented 6 years ago

Thanks, that's a good news for me. Reading that 4.3GB turtle files is a nightmare for me and my laptop. πŸ˜ƒ

szeke commented 6 years ago

Something seems amiss because NTRIPLES is more verbose as it expands all URIs, so I don’t expect the NTRIPLES file to be smaller. I wonder if URIs are being reused in the shatter files inadvertently so that when the triples are consolidated in the NTRIPLES files many triples are getting merged unexpectedly.

p

On Jun 30, 2018, at 2:36 PM, Xi Jin notifications@github.com wrote:

Thanks, that's a good news for me. Reading that 4.3GB turtle files is a nightmare for me and my laptop. πŸ˜ƒ

β€” You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NextCenturyCorporation/AIDA-Interchange-Format/issues/15#issuecomment-401567704, or mute the thread https://github.com/notifications/unsubscribe-auth/ABBlpzSw_fKgEwsH67IIqV0vIPnjyd7Yks5uB-9HgaJpZM4U6dPh.

gabbard commented 6 years ago

@fatestigma @szeke : This was most likely due to #24 . This if fixed on the branch #23 which should get merged to master in the next couple days (but you can go ahead and use the branch)

gabbard commented 5 years ago

Assuming this is fixed by #24 since no further problems have been observed.