Open chraibi opened 5 years ago
In Gitlab by @anna-braun on Mar 2, 2019, 20:16
changed the description
In Gitlab by @chraibi on Mar 3, 2019, 12:21
@anna-braun can you run the following code with your file?
In Gitlab by @chraibi on Mar 3, 2019, 19:40
created branch 100-huge-trajectory-files
to address this issue
In Gitlab by @chraibi on Mar 3, 2019, 19:51
@anna-braun is this a txt or an xml file? :thinking:
In Gitlab by @anna-braun on Mar 4, 2019, 08:16
It is txt, but it does not work for xml as well.
@anna-braun What's the status?
I guess it still does not work with huge trajectories. We solved the problem by splitting the trajectory-file every 10 MB (in jpscore).
@anna-braun Can we close the issue?
No, please keep it open.
jpsreport
reads all the file at once, which is not good for big files of several GB.
@anna-braun @chraibi: Which steps are planned? What suggestions are there to edit the issue?
I thought that a big trajectory should be read chunk-wise, not at once.
Some think like this.
Sounds good. Who has capacity?
Can we realize this for version 8.4?
Probably, not. It depends on our capacity (I can't right now) and if its accute.
Actually I have exactly the same problem with my trajectories files. My trajectory files (.txt) range between 200 MB and 500 MB.
To complete the analysis with jpsreport I currently cut the files after 10.000 frames. This results in data files of about 10-40 MB, otherwise jpsreport will choke on the data and generate a segmentation fault. Therefore instead of running jpsreport once for the original datafile, I have to do this up to 40 times with the 'cutted' data.
Any solution to this issue would be greatly appreciated.
You can read the trajectory-files at the same time:
<trajectories format="txt" unit="m">
<file name="traj_01.txt" />
<file name="traj_02.txt" />
<file name="traj_03.txt" />
<file name="traj_04.txt" />
<file name="traj_05.txt" />
<file name="traj_06.txt" />
<file name="traj_07.txt" />
<file name="traj_08.txt" />
<file name="traj_09.txt" />
<file name="traj_10.txt" />
<file name="traj_11.txt" />
<file name="traj_12.txt" />
<file name="traj_13.txt" />
<file name="traj_14.txt" />
<file name="traj_15.txt" />
<file name="traj_16.txt" />
<file name="traj_17.txt" />
<file name="traj_18.txt" />
<file name="traj_19.txt" />
<file name="traj_20.txt" />
<file name="traj_21.txt" />
<file name="traj_22.txt" />
<file name="traj_23.txt" />
<file name="traj_24.txt" />
<file name="traj_25.txt" />
<file name="traj_26.txt" />
<file name="traj_27.txt" />
<file name="traj_28.txt" />
<file name="traj_29.txt" />
<file name="traj_30.txt" />
<file name="traj_31.txt" />
<file name="traj_32.txt" />
<file name="traj_33.txt" />
<file name="traj_34.txt" />
<file name="traj_35.txt" />
<file name="traj_36.txt" />
<file name="traj_37.txt" />
<file name="traj_38.txt" />
<file name="traj_39.txt" />
<file name="traj_40.txt" />
<file name="traj_41.txt" />
</trajectories>
instead of
<trajectories format="txt" unit="m">
<file name="traj_01.txt" />
</trajectories>
@gjaeger Thank you. This workes for me!
The question arises to me how the speed of shared input data is calculated.
For comparison:
trajectory for one agent in one file:
# PersID Frame x/m y/m z/m
1 0 0.8480 5.0880 0.0000
1 1 1.4850 5.0880 0.0000
...
1 1537 999.8850 5.0880 0.0000
based on IFD_I-file:
#Frame PersID x/m y/m z/m Individual density(m^(-2)) Individual velocity(m/s)
00000 1 0.8480 5.0880 0.0000 1.0000 1.2740
00001 1 1.4850 5.0880 0.0000 1.0000 1.2870
00002 1 2.1350 5.0880 0.0000 1.0000 1.3000
..
The v(t)-diagram for the first frames (frame 1 to 767):
# PersID Frame x/m y/m z/m
1 0 0.8480 5.0880 0.0000
1 1 1.4850 5.0880 0.0000
1 2 2.1350 5.0880 0.0000
1 3 2.7850 5.0880 0.0000
...
1 766 498.7350 5.0880 0.0000
1 767 499.3850 5.0880 0.0000
based on:
#Frame PersID x/m y/m z/m Individual density(m^(-2)) Individual velocity(m/s)
00000 1 0.8480 5.0880 0.0000 1.0000 1.2740
00001 1 1.4850 5.0880 0.0000 1.0000 1.2870
00002 1 2.1350 5.0880 0.0000 1.0000 1.3000
The v(t)-diagram for the second part (frame 768 to 1537):
# PersID Frame x/m y/m z/m
1 768 500.0350 5.0880 0.0000
1 769 500.6850 5.0880 0.0000
1 770 501.3350 5.0880 0.0000
...
1 1536 999.2350 5.0880 0.0000
1 1537 999.8850 5.0880 0.0000
based on:
#Frame PersID x/m y/m z/m Individual density(m^(-2)) Individual velocity(m/s)
00768 1 500.0350 5.0880 0.0000 1.0000 1.3000
00769 1 500.6850 5.0880 0.0000 1.0000 1.3000
00770 1 501.3350 5.0880 0.0000 1.0000 1.3000
I would have expected that the speed would also increase at the beginning.
@chraibi Does jpsreport have a memory function?
@schroedtert
$ lldb /Users/gjaeger/Documents/hubs/JuPedSim_github/jpsreport/bin/jpsreport ini_2019.xml
(lldb) target create "/Users/gjaeger/Documents/hubs/JuPedSim_github/jpsreport/bin/jpsreport"
Current executable set to '/Users/gjaeger/Documents/hubs/JuPedSim_github/jpsreport/bin/jpsreport' (x86_64).
(lldb) settings set -- target.run-args "ini_2019.xml"
(lldb) run
Process 8588 launched: '/Users/gjaeger/Documents/hubs/JuPedSim_github/jpsreport/bin/jpsreport' (x86_64)
----
JuPedSim - JPSreport
Current date : Thu Sep 05 10:49:51 2019
Version : 0.8.4
Compiler : g++ (8.3.0)
Commit hash : v0.8.3-119-gbc8ced4
Commit date : Wed Aug 14 03:47:13 2019
Branch : develop
Python : /opt/local/bin/python3.6 (3.6.9)
----
INFO: Parsing the ini file <ini_2019.xml>
INFO: logfile </Users/gjaeger/Documents/Simulationen/Mira/log.txt>
lineNr 100000
...
lineNr 10400000
Process 8588 stopped
* thread JuPedSim/jpsreport#1, queue = 'com.apple.main-thread', stop reason = signal SIGKILL
frame #0: 0x00007fff6e4c5f49 libsystem_platform.dylib`_platform_memmove$VARIANT$Haswell + 41
libsystem_platform.dylib`_platform_memmove$VARIANT$Haswell:
-> 0x7fff6e4c5f49 <+41>: rep movsb (%rsi), %es:(%rdi)
0x7fff6e4c5f4b <+43>: popq %rbp
0x7fff6e4c5f4c <+44>: retq
0x7fff6e4c5f4d <+45>: cmpq %rdi, %rsi
Target 0: (jpsreport) stopped.
You can read the trajectory-files at the same time:
<trajectories format="txt" unit="m"> <file name="traj_01.txt" /> <file name="traj_02.txt" /> <file name="traj_03.txt" /> <file name="traj_04.txt" /> <file name="traj_05.txt" /> <file name="traj_06.txt" /> <file name="traj_07.txt" /> <file name="traj_08.txt" /> <file name="traj_09.txt" /> <file name="traj_10.txt" /> <file name="traj_11.txt" /> <file name="traj_12.txt" /> <file name="traj_13.txt" /> <file name="traj_14.txt" /> <file name="traj_15.txt" /> <file name="traj_16.txt" /> <file name="traj_17.txt" /> <file name="traj_18.txt" /> <file name="traj_19.txt" /> <file name="traj_20.txt" /> <file name="traj_21.txt" /> <file name="traj_22.txt" /> <file name="traj_23.txt" /> <file name="traj_24.txt" /> <file name="traj_25.txt" /> <file name="traj_26.txt" /> <file name="traj_27.txt" /> <file name="traj_28.txt" /> <file name="traj_29.txt" /> <file name="traj_30.txt" /> <file name="traj_31.txt" /> <file name="traj_32.txt" /> <file name="traj_33.txt" /> <file name="traj_34.txt" /> <file name="traj_35.txt" /> <file name="traj_36.txt" /> <file name="traj_37.txt" /> <file name="traj_38.txt" /> <file name="traj_39.txt" /> <file name="traj_40.txt" /> <file name="traj_41.txt" /> </trajectories>
instead of
<trajectories format="txt" unit="m"> <file name="traj_01.txt" /> </trajectories>
I think it's enough to specify a directory. No need to write all the names of files one by one.
Actually I have exactly the same problem with my trajectories files. My trajectory files (.txt) range between 200 MB and 500 MB. To complete the analysis with jpsreport I currently cut the files after 10.000 frames. This results in data files of about 10-40 MB, otherwise jpsreport will choke on the data and generate a segmentation fault. Therefore instead of running jpsreport once for the original datafile, I have to do this up to 40 times with the 'cutted' data. Any solution to this issue would be greatly appreciated.
There is a solution. jpsreport
reads all files and stores it's content at once, which is not a good idea for large files.
This needs to be changed, by reading chunks of data and process them, chunk by chunk. See also here for some ideas.
Anyone willing to tackle this is welcome to contribute.
The question arises to me how the speed of shared input data is calculated. @chraibi Does jpsreport have a memory function?
I don't know what you mean with shared input
, but I don't think that jpsreport
have a "memory function".
@chraibi By shared input
I mean the division into individual files. I wonder if the shared trajectory files are read independently? Otherwise I cannot explain to myself that the calculation of the velocity/speed takes place without loss of the knowledge of previous frames. See my example above.
I think jpsreport
reads one file at once, processes it then writes the results out. Then starts again this mechanics for other files, independently from each other.
What @mirakuepper is doing is a smart hack, but of course works only, if you are OK with these discontinuities in the results.
I think jpsreport reads one file at once, processes it then writes the results out. Then starts again this mechanics for other files, independently from each other.
If the mechanism would start independently, then I expect the speed in the second part (see v(t)-diagram for the second part) to increase as well (see see v(t)-diagram for the first part). This is not the case. The analysis shows no discontinuities with respect to movement speed.
In Gitlab by @anna-braun on Mar 1, 2019, 10:56 [origin]
I wanted to do some calculations with jpsreport, but generated a segmentation fault:
The trajectory file has about 2GB.
You can find the files (trajectory, report, log) here:
https://fz-juelich.sciebo.de/s/6BJJFC84ugUlm8r