Possible bug in pull request 1508, json parsing?

smathermather commented 2 years ago

How did you install ODM? (Docker, installer, natively, ...)?

Docker

What is the problem?

Possible introduction of bug with https://github.com/OpenDroneMap/ODM/pull/1508

What should be the expected behavior? If this is a feature request, please describe in detail the changes you think should be made to the code, citing files and lines where changes should be made, if possible.

I need to investigate further, but hunch is that larger JSON files aren't being parsed appropriate (cursory look at pull request changes + error below are the source of the hunch).

How can we reproduce this? What steps did you do to trigger the problem? If this is an issue with processing a dataset, YOU MUST include a copy of your dataset uploaded on Google Drive or Dropbox (otherwise we cannot reproduce this).

Go big, maybe 4000 image big. I can share a dataset as needed.

[INFO]    Export reconstruction stats
[INFO]    running "/code/SuperBuild/install/bin/opensfm/bin/opensfm" compute_statistics --diagram_max_points 100000 "/var/www/data/81380890-a912-4365-bad8-2e5ae0e79cc8/opensfm"
File "/code/SuperBuild/install/bin/opensfm/bin/opensfm_main.py", line 25, in <module>
commands.command_runner(
File "/code/SuperBuild/install/bin/opensfm/opensfm/commands/command_runner.py", line 38, in command_runner
command.run(data, args)
File "/code/SuperBuild/install/bin/opensfm/opensfm/commands/command.py", line 13, in run
self.run_impl(data, args)
File "/code/SuperBuild/install/bin/opensfm/opensfm/commands/compute_statistics.py", line 12, in run_impl
compute_statistics.run_dataset(dataset, args.diagram_max_points)
File "/code/SuperBuild/install/bin/opensfm/opensfm/actions/compute_statistics.py", line 25, in run_dataset
stats_dict = stats.compute_all_statistics(data, tracks_manager, reconstructions)
File "/code/SuperBuild/install/bin/opensfm/opensfm/stats.py", line 631, in compute_all_statistics
stats["processing_statistics"] = processing_statistics(data, reconstructions)
File "/code/SuperBuild/install/bin/opensfm/opensfm/stats.py", line 467, in processing_statistics
obj = io.json_load(fin)
File "/code/SuperBuild/install/bin/opensfm/opensfm/io.py", line 1020, in json_load
return json.load(fp)
File "/usr/lib/python3.9/json/__init__.py", line 293, in load
return loads(fp.read(),
File "/usr/lib/python3.9/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.9/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.9/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

===== Dumping Info for Geeks (developers need this to fix bugs) =====
Child returned 1
Traceback (most recent call last):
File "/code/stages/odm_app.py", line 88, in execute
self.first_stage.run()
File "/code/opendm/types.py", line 371, in run
self.next_stage.run(outputs)
File "/code/opendm/types.py", line 371, in run
self.next_stage.run(outputs)
File "/code/opendm/types.py", line 371, in run
self.next_stage.run(outputs)
File "/code/opendm/types.py", line 352, in run
self.process(self.args, outputs)
File "/code/stages/run_opensfm.py", line 67, in process
octx.export_stats(self.rerun())
File "/code/opendm/osfm.py", line 540, in export_stats
self.run("compute_statistics --diagram_max_points 100000")
File "/code/opendm/osfm.py", line 34, in run
system.run('"%s" %s "%s"' %
File "/code/opendm/system.py", line 106, in run
raise SubprocessException("Child returned {}".format(retcode), retcode)
opendm.system.SubprocessException: Child returned 1

===== Done, human-readable information to follow... =====

[ERROR]   Uh oh! Processing stopped because of strange values in the reconstruction. This is often a sign that the input data has some issues or the software cannot deal with it. Have you followed best practices for data acquisition? See https://docs.opendronemap.org/flying/

smathermather commented 2 years ago

Currently rebuilding docker image from prior to the possibly offending pull request and really hoping I don't have to rerun from scratch as it will be another 50 hours if I do. :smile:

smathermather commented 2 years ago

Hmm, looking closer, I don't see any changes to json parsing in the pull request, so I'm not sure what is happening. More as I see it.

smathermather commented 2 years ago

Alas, fails still. Removed reconstruction.json. 20 hours to get through incremental reconstruction... but any insights welcome in the interim (processed this dataset a few months ago with identical settings... yada).

pierotofy commented 2 years ago

Look at the opensfm/reports folder; there should be some files in there:

Are they valid JSON?

pierotofy commented 2 years ago

Also note that split-merge submodels should always be run with --skip-report, since we don't have support for generating reports. Maybe that's the culprit. https://github.com/OpenDroneMap/ODM/blob/master/opendm/osfm.py#L634

smathermather commented 2 years ago

No split-merge this time. I have plenty of RAM for 4k images:

import json

f = open('features.json')
data = json.load(f)

for i in data['image_reports']:
    print(i)

f.close()

output:

...
{'image': 'EP-11-29573_0116_0125.JPG', 'num_features': 16354, 'wall_time': 9.405448540986981}
{'image': 'EP-11-29573_0114_0245.JPG', 'num_features': 16561, 'wall_time': 15.17158963100519}

import json

f = open('reconstruction.json')

data = json.load(f)

for i in data['reconstructions']:
    print(i)

f.close()

returns

{'bootstrap': {'image_pair': ['EP-11-29573_0125_0226.JPG', 'EP-11-29573_0125_0227.JPG'], 'common_tracks': 68, 'two_view_reconstruction': {'5_point_inliers': 65, 'plane_based_inliers': 65, 'method': 'plane_based'}, 'triangulated_points': 65, 'decision': 'Success', 'memory_usage': 33538043904}, 'grow': {'steps': [{'images': ['EP-11-29573_0125_0228.JPG'], 'resection': {'num_common_points': 10, 'num_inliers': 10, 'shots': ['EP-11-29573_0125_0228.JPG']}, 'memory_usage': 33538043904, 'triangulated_points': 83, 'bundle': {'brief_report': 'Ceres Solver Report: Iterations: 23, Initial cost: 1.427301e+03, Final cost: 1.746516e+01, Termination: CONVERGENCE', 'wall_times': {'setup': 0.0008, 'run': 0.112772, 'teardown': 0.000746}}, 'retriangulation': {'num_points_before': 147, 'num_points_after': 148, 'wall_time': 0.10436228799517266}, 'bundle_after_retriangulation': {'brief_report': 'Ceres Solver Report: Iterations: 7, Initial cost: 1.934065e+01, Final cost: 1.884404e+01, Termination: CONVERGENCE', 'wall_times': {'setup': 0.000786, 'run': 0.041696, 'teardown': 0.00059}}}]}}
{'bootstrap': {'image_pair': ['EP-11-29573_0125_0223.JPG', 'EP-11-29573_0125_0224.JPG'], 'common_tracks': 80, 'two_view_reconstruction': {'5_point_inliers': 77, 'plane_based_inliers': 66, 'method': '5_point'}, 'triangulated_points': 77, 'decision': 'Success', 'memory_usage': 33538043904}, 'grow': {'steps': []}}
{'bootstrap': {'image_pair': ['EP-11-29573_0114_0094.JPG', 'EP-11-29573_0114_0095.JPG'], 'common_tracks': 72, 'two_view_reconstruction': {'5_point_inliers': 68, 'plane_based_inliers': 64, 'method': '5_point'}, 'triangulated_points': 68, 'decision': 'Success', 'memory_usage': 33538043904}, 'grow': {'steps': []}}

import json

f = open('tracks.json')

data = json.load(f)

for i in data['view_graph']:
    print(i)

f.close()

Yields

...
['EP-11-29573_0113_0177.JPG', 'EP-11-29573_0114_0246.JPG', 15]
['EP-11-29573_0114_0246.JPG', 'EP-11-29573_0115_0080.JPG', 77]
['EP-11-29573_0133_0187.JPG', 'EP-11-29573_0133_0233.JPG', 4]
['EP-11-29573_0114_0246.JPG', 'EP-11-29573_0115_0079.JPG', 16]

Matches isn't json, but I think that changed to a binary file a while back.

smathermather commented 2 years ago

Now running both with opensfm/reports/features.json removed but with reports enabled under build of git checkout f1fc89e517461684f6abac6ebba2a416fc7b858e as well as running with current opendronemap/odm docker image but reports disabled.

Update: first one failed with same error with or without the reports/features.json removed, so I can rule out the suspected pull request above. Second one is of course happily humming along at export_geocoords.

opensfm/features.json appears to be valid (loading with more and tail). Happy to validate it with python, but recommendation of a way to validate json without loading the whole 2.3GB file into memory would be appreciated. :laughing:

pierotofy commented 2 years ago

Matches isn't json

It should be; we changed the tracks.csv file to binary, but these should still be JSON.

smathermather commented 2 years ago

Attempted a rerun, and now getting a failure much earlier but in matching:

2022-07-30 06:40:47,777 INFO: Matched 44984 pairs (brown-brown: 44984) in 5160.317014105996 seconds (0.11471449890369909 seconds/pair).
Traceback (most recent call last):
File "/code/SuperBuild/install/bin/opensfm/bin/opensfm_main.py", line 25, in <module>
commands.command_runner(
File "/code/SuperBuild/install/bin/opensfm/opensfm/commands/command_runner.py", line 38, in command_runner
command.run(data, args)
File "/code/SuperBuild/install/bin/opensfm/opensfm/commands/command.py", line 13, in run
self.run_impl(data, args)
File "/code/SuperBuild/install/bin/opensfm/opensfm/commands/match_features.py", line 13, in run_impl
match_features.run_dataset(dataset)
File "/code/SuperBuild/install/bin/opensfm/opensfm/actions/match_features.py", line 15, in run_dataset
matching.save_matches(data, images, pairs_matches)
File "/code/SuperBuild/install/bin/opensfm/opensfm/matching.py", line 151, in save_matches
data.save_matches(im1, im1_matches)
File "/code/SuperBuild/install/bin/opensfm/opensfm/dataset.py", line 381, in save_matches
with self.io_handler.open(self._matches_file(image), "wb") as fw:
File "/code/SuperBuild/install/bin/opensfm/opensfm/io.py", line 1462, in open
return open(*args, **kwargs)
OSError: [Errno 117] Structure needs cleaning: '/var/www/data/5623d897-c816-4b62-8fd7-534ad95f1943/opensfm/matches/EP-11-29573_0115_0144.JPG_matches.pkl.gz'

===== Dumping Info for Geeks (developers need this to fix bugs) =====
Child returned 1
Traceback (most recent call last):
File "/code/stages/odm_app.py", line 88, in execute
self.first_stage.run()
File "/code/opendm/types.py", line 371, in run
self.next_stage.run(outputs)
File "/code/opendm/types.py", line 371, in run
self.next_stage.run(outputs)
File "/code/opendm/types.py", line 371, in run
self.next_stage.run(outputs)
File "/code/opendm/types.py", line 352, in run
self.process(self.args, outputs)
File "/code/stages/run_opensfm.py", line 35, in process
octx.feature_matching(self.rerun())
File "/code/opendm/osfm.py", line 416, in feature_matching
self.match_features(rerun)
File "/code/opendm/osfm.py", line 421, in match_features
self.run('match_features')
File "/code/opendm/osfm.py", line 34, in run
system.run('"%s" %s "%s"' %
File "/code/opendm/system.py", line 106, in run
raise SubprocessException("Child returned {}".format(retcode), retcode)
opendm.system.SubprocessException: Child returned 1

===== Done, human-readable information to follow... =====

[ERROR]   Uh oh! Processing stopped because of strange values in the reconstruction. This is often a sign that the input data has some issues or the software cannot deal with it. Have you followed best practices for data acquisition? See https://docs.opendronemap.org/flying/

pierotofy commented 2 years ago

Mm, "That is strongly indicative of file-system corruption"

https://unix.stackexchange.com/questions/330742/cannot-remove-file-structure-needs-cleaning

smathermather commented 2 years ago

That may explain a few things I'm seeing, and here I am with a file system corruption blaming pull requests.

OpenDroneMap / ODM