mkuemmel commented 3 years ago

In some runs it happened that after finishing ~50k objects in two days the last object still ran for at least 12h more. This does not make sense especially since there was no grouping involved in those runs.

mkuemmel commented 3 years ago

Placed all necessary files in problem382.tgz on irods.

mkuemmel commented 3 years ago

Just wanted to see the whereabouts of this issue after the newest code updates. I ran develop on it. It dumped away after some hours. The RAM consumption is: p382_VSZ

I guess it just used up all RAM and then dumped away. Is it possible that when assigning levmar workspace in SE++ the memory usage goes up?

mkuemmel commented 3 years ago

The current situation is here https://github.com/astrorama/SourceXtractorPlusPlus/issues/361#issuecomment-921588312 and https://github.com/astrorama/SourceXtractorPlusPlus/issues/361#issuecomment-921595111

To be continued here...

mkuemmel commented 3 years ago

I could identify the object that finishes last. In the area vs. iteration plot it is the marked object on the upper right: long_duration

It is by far the largest object that needs all iterations. All the other large objects need long time as well but not that many iterations.

Question: if an object is in the model fitting it is actively worked on until it finishes??

ayllon commented 3 years ago

Question: if an object is in the model fitting it is actively worked on until it finishes??

Yes. Sources are assigned to worker threads, and the worker will not pick another one until every property is done.

ayllon commented 3 years ago

I don't understand many things here. I get this when I try to run on this set.

2021-09-20T10:04:13CEST SourceXtractor FATAL : ExpSigmoidConverter: min_value is not a positive value!

I am guessing your compilation flags aren't the same and the --fast-math is breaking something here. Since I have access to your cluster, let me check.

mkuemmel commented 3 years ago

The files for this run are in: /home/mkuemmel/SEPP_NIRJ/problem_382V0.16

I checked five times, but the config files (python and ascii) and the run script are still identical to the mid-July version which is in the tarball problem382.tgz mentioned above. We had the "ExpSigmoidConverter:" usually when using static start values in the python configuration.

With the unsorted output it is possible to determine when an objects started in queue and when it finished. The objects that finished hours later than all the others here (red curve): https://github.com/astrorama/SourceXtractorPlusPlus/issues/361#issuecomment-922499994

was detected about in the middle of the field. This means the 150 iterations on the 92k pixels took ~15 hours. Right now I am running a Disk+Bulge fit to see whether the same objects has an overhang also there (I think so). My next steps would be to:

isolate this 92k object and see whether a run on the cutout (with only few objects around) also takes 15h;
using the unsorted flush plus the SE++ logs to compute the run times for the large objects and map the run time over the parameters "object size" - "number of iterations" to see whether this is smooth or not.

Maybe 150 iterations on 100k just take 15h, and if this object is detected at the end the run takes 15h longer and it's bad luck.

ayllon commented 3 years ago

I am comparing your logs and mine, and my build detects way more sources, as you already pointed out. The configurations look identical, though.

I need to figure out what's going on 😞

mkuemmel commented 3 years ago

Let me know if I can help.

mkuemmel commented 3 years ago

From the logs I computed the run time [h] for each object. It is in the column "runtime" of this table: https://deepdip.iap.fr/#item/61489f18551d8b7d8200ed82

The run time is not very accurate at the low end, consequently half o the sources have negative (though small) run time.

There is a correlation between the object area, the # of iterations and the run time, though not a strong one. I'll prepare some plots for Thursday and we can discuss then, but the first impression is this large overhang of the last object could be "normal".

ayllon commented 3 years ago

I need to figure out what's going on

It seems the RMS map stored in irods is in fact the detection image too :/

db8dcc21dc5226dbe37ec0ffa4411b87  EUC_MER_BGSUB-MOSAIC-VIS_TILE79171-4425D5_20210512T234646.426084Z_00.00.fits
db8dcc21dc5226dbe37ec0ffa4411b87  EUC_MER_MOSAIC-VIS-RMS_TILE79171-FEB195_20210512T213639.952919Z_00.00.fits

No wonder my output is garbled. It seems the checksum should be 4c4fc84bea7a999879b1dc5ee2089386

I will get the file from eucgu94

mkuemmel commented 3 years ago

I am really sorry. I am not sure how this happened. The irods commands are a bit cumbersome. I think I did:

iput vis_data vis_rms but I think this uploads "vis_data" as "vis_rms" and doe not upload both "vis_data" and "vis_rms.

I corrected it nevertheless. the same issue was also in
/euclid-fr/home/pfmerwrite/SEPP/NIR/EUC_MER_MOSAIC-NIRSTACK-RMS_TILE79171-9A6C49_20210512T233926.338875Z_00.00.fits

This is corrected as well.

mkuemmel commented 3 years ago

As discussed here https://github.com/astrorama/SourceXtractorPlusPlus/issues/382#issuecomment-922765067 the objects with the 92k pixels takes ~13.3 hours runtime. I created two smaller stamp images around this big object with 1500x1200 an 3500x3200 pixels and used those as detection images. With the smaller images the objects needed only 55 iterations in 2.5h to finish and with the larger stamp 150 iterations in 13h. Two conclusions:

the # of iterations can depend on tiny details in the input data;
the 13h run time for the 92k object is confirmed;

The 2.5h for 55 iterations and 13h for 150 iterations are a bit strange, but this can be checked running with different max_iters. Independent on this it would be good to know whether the runtimes for this object do make sense.

ayllon commented 3 years ago

Hi Martin,

I am surprised by the difference. Does the detection size of the object remain the same on both cutouts? Because that should be the only thing to matter (I think).

mkuemmel commented 3 years ago

In the three images (full, 1500x1200, 3500x3200) the area is two times identical (92668) and one pixel less in the third image. Also all other object properties from the detection images agree down to <10-4 or so. The model fitting runs into a totally different solution for the 55 iteration run (no idea why).

mkuemmel commented 3 years ago

I uploaded onto irods: EUC_MER_MOSAIC-VIS-RMS_TILE79171-SM3.fits EUC_MER_BGSUB-MOSAIC-VIS_TILE79171-SM3.fits run_Ser_SM3.sh

EUC_MER_MOSAIC-VIS-RMS_TILE79171-SM4.fits EUC_MER_BGSUB-MOSAIC-VIS_TILE79171-SM4.fits run_Ser_SM4.sh

SM3 is smaller, and the big objects needed only 55 iterations to finish, SM4 needed 150 iterations to finish that object. The shell scripts just point to the cutout images, the configuration files are the 'usual' ones.

Let me know if there are problems (I uploaded one by one, should be okay now...)

mkuemmel commented 3 years ago

...the stamps are in /euclid-fr/home/pfmerwrite/SEPP/VIS

mkuemmel commented 3 years ago

Also when fitting disk+bulge models to that dataset there is one overhanging object, as can be seen in https://github.com/astrorama/SourceXtractorPlusPlus/issues/361#issuecomment-926417171. But its a different objects now with 150k pixels that needs all the 150 iterations and requires ~110 h run time. The 92k pixel objects overhanging in sersic fitting also needs 150 iterations, but finishes within the bulk processing, needing ~40 or so.

mkuemmel commented 3 years ago

I ran a cutout containing the 92k objects with several limits on the number of iterations. The big objects dominates the runtime on that stamp image. The # of iterations vs. runtime is: bigobject_runtime

Red are the points and blues a fit to the points. It's very consistent with a linear runtime for each iteration. There are ~10.9 iterations per hour or ~5.5 minutes per iteration.

mkuemmel commented 3 years ago

Here the cutout detection image that I used for run time tests: bigobject_cutout The run time is dominated by the the time necessary for the bright, saturated star in the center. I ran with that detection image with two models, Sersic and Disk+Bulge and a varying number of max_iterations to find the run time as a function of the iterations which is: bigobject_fitting For both models the time per iteration is really linear as the fitted lines show.

Using the full runs I selected the objects that used up all iterations (150) and plotted the estimated processing time (for both model) versus their detection area: runtime_vs_area While there is quite a variation in the runtime for a given objects size, there is an overall correlation between the detection area and the processing time for both models. There are many reasons for that variation:

the fitting area is not identical to the detection area;
within a large run there is varying overhead for pushing pixel data into the tile memory and similar;

Overall it looks like that the fitting of the large objects(s) with many iterations just takes some hours, and if those large objects happended to be detected and measures/fitted at the end of the detection sequence then their measurements stop long after all other objects have been measured.

Either the user has to wait or we limit the time spent on a single object to avoid these large hangovers in a run. This is currently under discussion.

ayllon commented 3 years ago

From what I can see, time is spent as follows

Rastering 22%
Re-sampling 7%
Convolution 60.7%
- Forward FFT 21%
- Backwards FFT 21%
- Padding / mirroring (required for the convolution) 14%
- Copying the transform back to the initial buffer 3.8%
- Plus some other minor contributions

It doesn't add up to 100% because I am focusing on the minimization part only. The convolution code rounds up dimensions to the nearest multiple of 512 if the size is above 1024. This is because FFTW is supposed to behave better with some particular sizes (2^a 3^c 5^d 7^e 13^f, and powers of 2 particularly).

The "good" values for <=1024 are precomputed. IIRC I had measured better performance for things like 5120x5120 than for 5000x5000 pixels. Maybe I have to adjust this heuristic and avoid overheads above a given threshold. Also, I had not measured the impact on the padding.

marcschefer commented 2 years ago

423 and other recent fixes should have fixed that

astrorama / SourceXtractorPlusPlus

Last object takes way too much time #382

423 and other recent fixes should have fixed that