Closed abh3hu closed 6 years ago
The pst files have to go in /vagrant/newman-ingester/TestPST/pst/TestPST1/ (not /home/vagrant/...). Let me know if this fixes your problem. Other option is to try to ingest from the command line.
I tried ingesting with command line. The data sets appear but none of the widgets have updated
Below are a few troubleshooting steps you can try. We will probably need the log files to determine the problem. Another option is to attach a pst file your trying to ingest.
==> default: Forwarding ports... default: 80 (guest) => 80 (host) (adapter 1) default: 443 (guest) => 443 (host) (adapter 1) default: 8787 (guest) => 8787 (host) (adapter 1) default: 9200 (guest) => 9200 (host) (adapter 1) default: 4040 (guest) => 4040 (host) (adapter 1) default: 3000 (guest) => 3000 (host) (adapter 1) default: 5984 (guest) => 5984 (host) (adapter 1) default: 5601 (guest) => 5601 (host) (adapter 1) default: 5000 (guest) => 5000 (host) (adapter 1) default: 22 (guest) => 2222 (host) (adapter 1)
I am trying to ingest a single PST file from the enron dataset. I have attached it.
I am reinstalling the VM now
Here is my current status before reinstalling the VM:
* elasticsearch is running
vagrant@vagrant-ubuntu-trusty-64:~$ df -h Filesystem Size Used Avail Use% Mounted on udev 2.5G 12K 2.5G 1% /dev tmpfs 502M 368K 502M 1% /run /dev/sda1 40G 26G 13G 68% / none 4.0K 0 4.0K 0% /sys/fs/cgroup none 5.0M 0 5.0M 0% /run/lock none 2.5G 0 2.5G 0% /run/shm none 100M 0 100M 0% /run/user
I do not have any files in that folder
Here are my ports:
$ vagrant up Bringing machine 'default' up with 'virtualbox' provider... ==> default: Clearing any previously set forwarded ports... ==> default: Clearing any previously set network interfaces... ==> default: Preparing network interfaces based on configuration... default: Adapter 1: nat ==> default: Forwarding ports... default: 8787 (guest) => 8787 (host) (adapter 1) default: 1337 (guest) => 1337 (host) (adapter 1) default: 9200 (guest) => 9200 (host) (adapter 1) default: 5601 (guest) => 5601 (host) (adapter 1) default: 4040 (guest) => 4040 (host) (adapter 1) default: 5984 (guest) => 5984 (host) (adapter 1) default: 22 (guest) => 2222 (host) (adapter 1)
I'll try ingesting the pst file you zipped. The problem could be one of two things now.
The ENRON data was old and didn't follow the email format standard very well; also, most of the ENRON data I found publicly had been scrubbed with a lot of added dsrm statements. You could follow the steps to create a personal pst file and try ingesting it. Also, I'll try ingesting the attached pst to rule this situation out.
Second, I just ran into an issue very similar to this. I could start the ingest, the ingest name would show up but there were no contents. On my system, for whatever reason docker was inactive. After ssh into the VM, use the command "sudo service docker status" to see if its inactive. Use "sudo service docker start" to make it active.
I started from scratch (i.e. downloaded newman-vm-v2.1.3.box from the link). I then followed the steps in the Quick Start guide (http://sotera.github.io/newman/quick-start/). I copied the specific Enron pst file you posted to the correct location C:\Users\jsmith\VirtualBoxVMs\newman-ingester\enron\pst\enrontest\swerzbin-m.pst (note: path may be slightly different for your machine based on what you called the case and label). I then used the gui to ingest the data. This added 341 emails to the Newman application.
So, its not the file or the VM. That leaves memory/space issues (which from your post above doesn't seem like the problem) or corrupted vagrantfile from multiple attempts. You can delete or rename the existing Vagrantfile and .vagrant folder and regenerate them (i.e. vagrant init newman-vm-v2.1.3 newman-vm-v2.1.3.box, vagrant up, vagrant ssh, tangelo restart). Let me know if that works.
I was able to ingest the graph. Deleting the VM, Vagrant file, and .vagrant folder helped.
A small bug that happened during ingestion is that the VM would pause several times. I would need to unpause the VM for the ingestion to continue.
I clicked on a topic to see the list of Emails, but I do not see a graph.
I'm seeing the same thing on my system. It looks like when this data was scrubbed to remove PII and add the EDRM messages, they deleted some important email format. Many of the From (senders) are shown as mike.swerzbin@enron.commike.swerzbin@enron.com. It's missing the semicolon separator. I'll need to look deeper in the data to see if its actually missing or our ingest process dropped the separator. With the From on most emails incorrect, the graph doesn't get built. If you want to see a graph, select the dataset Accounts ranked and then account mike.swerzbin@enron.com.
The ENRON dataset psts have been the most buggy for us to ingest. Having said that, I've ingested dozens of them with better results than this particular pst file. If you create your own or use more recent pst files, you will see a lot better results. We also handle email formats for mbox and emls.
I have never seen my VM just pause. I google 'why does virtualbox vm pause' and most of the articles hint that its a memory issue (either too much allocated in the VM or other gui type applications causing a resource conflict).
Do you have a dataset that I can download and test with, such as the Schiavo that was preloaded with newman-vm2.1.1? I would like to present an interesting dataset to other developers to show the value of the Newman Project
Schiavo isn't all that good either. Some people got real excited on the "Right to Life" issue and most of the networks were real shallow. I like using the Jeb Bush dataset. There is both an mbox file and a set of emls. There's a lot of attachments, some with exif/geolocation information. Some of the better search terms include education, hurricane, and money. There are even a couple of Spanish emails where you can show translation capability. I've attached a zip file on my Dropbox. https://www.dropbox.com/s/folphh5172tmf54/jeb%40jeb.org_modified.zip?dl=0
Closing issue - test email set delivered.
I am trying to ingest .pst files from the Enron data set. I put the files in /home/vagrant/newman-ingester/TestPST/pst/TestPST1/.
I go to the GUI and click on "New Dataset..." but a window does not pop up. Upon inspecting the page, there are two errors that appear: 1) Uncaught ReferenceError: app_ingest_email is not defined 2) Uncaught ReferenceError: newman_domain_email is not defined