CroatianMeteorNetwork / RMS

RPi Meteor Station
https://globalmeteornetwork.org/
GNU General Public License v3.0
176 stars 50 forks source link

TrackStack performance degradation on RPi devices after recent changes #321

Closed aitov closed 1 month ago

aitov commented 1 month ago

Hi,

User, which uses RPi4 for track stack generation complains that it stop working on his device and failed with timeout error for Pool worker with 2000 seconds. We tried increase it to 3000 seconds and see that processing of one image takes more than 33 minutes by one cpu core. With such speed it will process forever.

I rolled back version to last unmodified revision (b4ff31ab) and restarted this processing: it takes 5 minutes for all 11 images ( avg 2min per image) so performance degraded dramatically for this device.

I tried find the bottleneck place and see that was after this change: from :TrackStack->xyToRaDecPP->cyXYToRADec ->equatorialCoordPrecession to : TrackStack->xyToRaDecPP->cyXYToRADec ->equatorialCoordAndRotPrecession

For me it hard to understand logic and propose the fix, but maybe for TrackStack we could use old implementation. @dvida @Cybis320 please advise, seems like all users which uses RPi4 for TrackStack affected

Cybis320 commented 1 month ago

Hi Atoiv, The update is more computationally demanding for sure. Calculating the rotation of the FOV due to precession in particular is difficult. For that reason, the pointing and rotation are pre-calculated so that the new pointing can be reused for all datapoints in the set as long as they share the same timestamp. In the TrackStack code, we can't take advantage of precalc because each datapoint is passed individually instead of in an array to the cyXYToRaDec function. It'd be interserting to see if optimizing this section would help. Since getMiddleTimeFF(os.path.basename(ff_temp) will return the same value per FF file, the datapoint should be passed in sets to the function.

# List of corners
    x_corns = [0, pp_ref.X_res,            0, pp_ref.X_res]
    y_corns = [0,            0, pp_ref.Y_res, pp_ref.Y_res]

    ra_list = []
    dec_list = []

    for ff_temp in ff_found_list:

        # Load the recalibrated platepar
        pp_temp = Platepar()
        pp_temp.loadFromDict(recalibrated_platepars[ff_temp], use_flat=config.use_flat)

        for x_c, y_c in zip(x_corns, y_corns):
            _, ra_temp, dec_temp, _ = xyToRaDecPP(
                [getMiddleTimeFF(os.path.basename(ff_temp), config.fps, ret_milliseconds=True)], [x_c], [y_c], [1], pp_ref,
                extinction_correction=False)
            ra_c, dec_c = ra_temp[0], dec_temp[0]

            ra_list.append(ra_c)
            dec_list.append(dec_c)
Cybis320 commented 1 month ago

maybe try something like this?

# List of corners
corners = [
    [0, 0],
    [pp_ref.X_res, 0],
    [0, pp_ref.Y_res],
    [pp_ref.X_res, pp_ref.Y_res]
]
ra_list = []
dec_list = []

for ff_temp in ff_found_list:
    # Load the recalibrated platepar
    pp_temp = Platepar()
    pp_temp.loadFromDict(recalibrated_platepars[ff_temp], use_flat=config.use_flat)
    jd_temp = getMiddleTimeFF(os.path.basename(ff_temp), config.fps, ret_milliseconds=True)

    # Convert all corners at once
    _, ra_temp, dec_temp, _ = xyToRaDecPP(
        [jd_temp] * len(corners),
        [corner[0] for corner in corners],
        [corner[1] for corner in corners],
        [1] * len(corners),
        pp_ref,
        extinction_correction=False
    )

    ra_list.extend(ra_temp)
    dec_list.extend(dec_temp)
Cybis320 commented 1 month ago

If this works, I can see other places in RMS where a similar optimization could be applied. Like in CheckFit, we could go from:

# Go through all matched stars
        for jd in matched_stars:

            matched_img_stars, matched_cat_stars, dist_list = matched_stars[jd]

            # Go through all stars on the image
            for img_star_entry, cat_star_entry in zip(matched_img_stars, matched_cat_stars):

                # Extract star coords
                star_y = img_star_entry[0]
                star_x = img_star_entry[1]
                cat_ra = cat_star_entry[0]
                cat_dec = cat_star_entry[1]

                # Convert image coordinates to RA/Dec
                _, star_ra, star_dec, _ = xyToRaDecPP([jd2Date(jd)], [star_x], [star_y], [1], \
                    platepar, extinction_correction=False, precompute_pointing_corr=True)

                # Compute angular distance between the predicted and the catalog position
                ang_dist = np.degrees(angularSeparation(np.radians(cat_ra), np.radians(cat_dec), \
                    np.radians(star_ra[0]), np.radians(star_dec[0])))

                # Store the angular separation in arc minutes
                global_dist_list.append(ang_dist*60)

To:

global_dist_list = []

# Go through all matched stars
for jd, (matched_img_stars, matched_cat_stars, dist_list) in matched_stars.items():
    # Prepare lists for batch processing
    star_x_list = []
    star_y_list = []
    cat_ra_list = []
    cat_dec_list = []

    # Collect all star coordinates for this jd
    for img_star_entry, cat_star_entry in zip(matched_img_stars, matched_cat_stars):
        star_y, star_x = img_star_entry[0], img_star_entry[1]
        cat_ra, cat_dec = cat_star_entry[0], cat_star_entry[1]

        star_x_list.append(star_x)
        star_y_list.append(star_y)
        cat_ra_list.append(cat_ra)
        cat_dec_list.append(cat_dec)

    # Convert all image coordinates to RA/Dec in one call
    jd_list = [jd2Date(jd)] * len(star_x_list)
    _, star_ra_list, star_dec_list, _ = xyToRaDecPP(
        jd_list, star_x_list, star_y_list, [1] * len(star_x_list),
        platepar, extinction_correction=False, precompute_pointing_corr=True
    )

    # Compute angular distances
    for cat_ra, cat_dec, star_ra, star_dec in zip(cat_ra_list, cat_dec_list, star_ra_list, star_dec_list):
        ang_dist = np.degrees(angularSeparation(
            np.radians(cat_ra), np.radians(cat_dec),
            np.radians(star_ra), np.radians(star_dec)
        ))
        global_dist_list.append(ang_dist * 60)  # Store the angular separation in arc minutes

There are similar scenarios in FFTalign, Platepar, Flux, and SkyFit.

dvida commented 1 month ago

This has actually already been addressed and it is a part of the prerelease branch. See the "precompute_pointing" parameter: https://github.com/CroatianMeteorNetwork/RMS/blob/prerelease/Utils/TrackStack.py#L368

The branch is still being tested, but we'll hopefully merge it into master soon.

Denis

Cybis320 commented 1 month ago

Oops, I thought precompute was already in master. Sorry. This being said, the loop in question cannot currently take advantage of precompute. This would:

    # Go through all FF files and find RA/Dec of image corners to find the size of the stack image ###

    # List of corners
    corners = [
        [0, 0],
        [pp_ref.X_res, 0],
        [0, pp_ref.Y_res],
        [pp_ref.X_res, pp_ref.Y_res]
    ]
    ra_list = []
    dec_list = []

    for ff_temp in ff_found_list:
        # Load the recalibrated platepar
        pp_temp = Platepar()
        pp_temp.loadFromDict(recalibrated_platepars[ff_temp], use_flat=config.use_flat)
        jd_temp = getMiddleTimeFF(os.path.basename(ff_temp), config.fps, ret_milliseconds=True)

        # Convert all corners at once
        _, ra_temp, dec_temp, _ = xyToRaDecPP(
            [jd_temp] * len(corners),
            [corner[0] for corner in corners],
            [corner[1] for corner in corners],
            [1] * len(corners),
            pp_ref,
            extinction_correction=False,
            precompute_pointing_corr=True
        )

        ra_list.extend(ra_temp)
        dec_list.extend(dec_temp)

Is it worth a PR?

aitov commented 1 month ago

@Cybis320 @dvida thank you for answers, good to know that it is already addressed, will wait for new version, please let me know if need help with testing

dvida commented 1 month ago

@Cybis320 Feel free to post a PR and we'll test it :)

g7gpr commented 1 month ago

Also seen same issue; thanks for the investigation work; also happy to test.

Cybis320 commented 1 month ago

Ok, so I was wrong about this, pre-compute doesn't need same jd to work, it just take the mean of the jd's. However, the issue is that the script transforms XYToRaDec for every single pixels of every single image - a million times per image! There ought to be a better way.

dvida commented 1 month ago

It does need to map the image coordinates to celestial coordinates so they are all in the same system. It's by design really.

g7gpr commented 1 month ago

Trackstack has always worked in this way, so I don't understand why it has suddenly started to run much more slowly. I could process 8 individual cameras and a combined trackstack from all 8 within 4 hours of completed processing. Now it takes a couple of days to complete.

Cybis320 commented 1 month ago

Are you still seeing slow perf on prerelease ?On Aug 1, 2024, at 5:51 PM, David Rollinson @.***> wrote: Trackstack has always worked in this way, so I don't understand why it has suddenly started to run much more slowly. I could process 8 individual cameras and a combined trackstack from all 8 within 4 hours of completed processing. Now it takes a couple of days to complete.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

Cybis320 commented 1 month ago

On master, at the moment, it makes a lot of sense why it’d be so slow. Performing precession correction a million times per frame would be extremely slow indeed. With precompute on prerelease, it should be back to its baseline performance. I still think there are probably better ways to stack the images but it’d be time consuming to developed. On Aug 1, 2024, at 5:51 PM, David Rollinson @.***> wrote: Trackstack has always worked in this way, so I don't understand why it has suddenly started to run much more slowly. I could process 8 individual cameras and a combined trackstack from all 8 within 4 hours of completed processing. Now it takes a couple of days to complete.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

g7gpr commented 1 month ago

As soon as pr #320 gets merged into prerelease I'll test it. Grateful if you can test pr #320 and report what you find so it can be merged in. I can't run a station on prerelease without kht breaking.

--correction, since #320 is based on prerelease, it should be trivial to test this.

g7gpr commented 1 month ago

Prerelease is approximately 6 times faster than master, 33 minutes vs 3 hours

Cybis320 commented 1 month ago

Yay. And that feels like it’s roughly back to where it was?

g7gpr commented 1 month ago

Have you checked it against all the versions of Python which are running RMS?

Cybis320 commented 1 month ago

Have you checked it against all the versions of Python which are running RMS?

Checked StackTrack against all versions of Python running RMS? No, I didn't. I don't think the changes made to StackTrack in prerelease are significant. The speed up is from changes made in CyFunction. I don't have the ability to check Prerelease on python 2.7. If there were problems with the precession correction running on 2.7, we would have noticed by now I would imagine. The major change most likely to cause issues is already in master. The pre computation change in CyFunction from master to prerelease is begnign and isn't likely to have any python version specific issue with it.

g7gpr commented 1 month ago

I'm seeing strange behaviour on 3.7.3, can you test.

python Utils/TrackStack.py ~/RMS_data/CapturedFiles/AU000A_20240730_100337_377485

  File "/home/au000a/vRMS/lib/python3.7/site-packages/RMS-0.1-py3.7-linux-x86_64.egg/RMS/QueuedPool.py", line 356, in _workerFunc
    result = func(*all_args, **self.func_kwargs)
  File "Utils/TrackStack.py", line 370, in stackFrame
    len(x_coords) * [1], pp_temp, extinction_correction=False, precompute_pointing_corr=True)
TypeError: xyToRaDecPP() got an unexpected keyword argument 'precompute_pointing_corr'

There was some change in keyword handling between python 3.7 and 3.8, especially for changeable numbers of parameters. I thikn it might be more to do with multiprocessing rather than precession.

Cybis320 commented 1 month ago

I'll try to replicate. Are you sure the xyToRaDecPP function is up to date on that machine, both in the source folder and the vRMS folder? Should be: /home/au000a/vRMS/lib/python3.7/site-packages/RMS-0.1-py3.7-linux-x86_64.egg/RMS/Astrometry/ApplyAstrometry.py line 175:

   # Compute RA/Dec in J2000
    _, ra_data, dec_data, _ = xyToRaDecPP(len(x_data)*[jd2Date(jd)], x_data, y_data, len(x_data)*[1], \
        platepar, extinction_correction=False, precompute_pointing_corr=True)
g7gpr commented 1 month ago

Confirmed all up to date.

g7gpr commented 1 month ago

It's an issue specific to this installation; apologies. I don't know what is causing it, but it runs fine on a fresh install of 3.7.3.

Cybis320 commented 1 month ago

No worries. I get this issue too sometimes where it seems to be running a cached version of the code from somewhere. It usually happens when manually applying changes.

g7gpr commented 1 month ago

Working well everywhere I have tested it.

markmac99 commented 1 month ago

Hoping for a fix to this soon - i kicked off a 467-image trackstack at noon today which is still running at 2100 on an 8-core i7 windows machine !

dvida commented 1 month ago

It's merged to pre-release, we just need to get it to a well tested state and merge.

markmac99 commented 1 month ago

good news, i will pull prerelease to my workstation and test it here too !

markmac99 commented 1 month ago

I switched UK0006 to this branch last night, but unfortunately it errored this morning. The Pi is running Bookworm-64bit, Python 3.11. I'll try rerunning it to see if i can get a clearer picture of the error.

  File "/home/rms/source/RMS/Utils/TrackStack.py", line 299, in trackStack
    crop_box = (np.min(non_empty_rows), np.max(non_empty_rows), np.min(non_empty_columns),
                ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rms/vRMS/lib/python3.11/site-packages/numpy/core/fromnumeric.py", line 2953, in min
    return _wrapreduction(a, np.minimum, 'min', axis, None, out,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rms/vRMS/lib/python3.11/site-packages/numpy/core/fromnumeric.py", line 88, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: zero-size array to reduction operation minimum which has no identity
dvida commented 1 month ago

I wonder if it's numpy version or data specific. I just ran it locally and it worked fine. Are you running numpy 2 by any chance?

markmac99 commented 1 month ago

numpy==1.26.4.

I just manually reran it and it did finish this time. I'll let it run again tonight and see if it works this time. Its possible that RMS was in an inconsistent state since i cut over to the new branch without rebooting the system,

Cybis320 commented 1 month ago

It's a little perplexing because nothing changed in TrackStack in prerelease, right?

dvida commented 1 month ago

Just one line which turns on the astrometry mapping optimization.

markmac99 commented 1 month ago

Today's testing:

UK0006 - worked, stacked 450 meteors in ~45 mins Unfortunately however, on the other cameras performance on the prerelease branch has reverted back. Its taken 35 minutes to get to 2% on UK001L (8/493) and 15 minutes for 0F to get to 1% (4/532). I switched them both to the prerelease branch, rebooted and manually ran RMS_Update.sh just for good measure.

UK000F, 1L and 2F are running Buster 32bit, Python3.7, numpy 1.21. These cameras are using the CMN prerelease branch. UK0006 is using my own Fork, into which i pulled prerelease a couple of days ago. I have not resynced it since then.

markmac99 commented 1 month ago

another point to note - my pis are running with CPU temp ~60 to 65 C during trackstacking. Is this going to cause performance issues? I think the Pi throtttles if it gets too hot.

markmac99 commented 1 month ago

okay, confirmed that my fork's prerelease branch, whch is 28 pushes behind the CMN one, is way faster: 5% done in 8 minutes, 28/527) on UK002F compared to 4% done in 90 minutes (10/493) on UK001L. I'll switch 1L to my own fork and retest but would be worth reviewing recent pushes to prerelease.

markmac99 commented 1 month ago

I had a look at the commits and there were changes made to ApplyAstrometry.py which might be relevant. I am tied up with Perseid processing for ukmon but if anyone has spare time these are the comparisons to my fork: https://github.com/markmac99/RMS/commit/d7f2b0eaea92d1b8dec12f385daebf08d7fe0b75 https://github.com/markmac99/RMS/commit/92e56b78f63f6021592691cbcbcd1809faa8fa29 Just to confirm, my fork cloned before these changes is running 10x faster than prerelease (15% of 527 in 30 mins compared to 4% of 532 in 90 minutes).

dvida commented 1 month ago

Hi Mark, I tested it yesterday and it ran quite fast on my machine. I looked at the changes, but they have nothing to do with astrometry and mapping (the change was with the photometry fit). Could it be that the machines had a load on them or something similar?

markmac99 commented 1 month ago

@dvida no, there's nothing else running except the background RMS daytime process. I can try killing that to see if it helps. I agree, its probably not due to the recent changes - i swapped UK001L to my fork and it also ran badly. After 2 hours it had not even stacked 5%. Both the main and my fork run very quickly on my Windows desktop (8 code 32GB memory),

Normally, i am running Trackstack as part of a post-processing script. Due to perseid volumes, this isn't running till about 10am, and this week, it has been extremely hot and the Pi's CPU is already at 60C due to the heavy workload of morning processing. Its weird though that it worked on UK0006, but not on the other three.

Cybis320 commented 1 month ago

The change most likely affecting performance was in cyFunctions.pyx. Since it’s a Cython file, it needs to be re-built for the changes to take effect. A slow TrackStack could be due to running an old cached version of cyFunctions. Was RMS_update.sh specifically run on the slow machines?

markmac99 commented 1 month ago

Yes, i ran RMS_Update.sh afterwards, and in fact rebooted as well and waited till RMS was ready. Nothing else running on the Pi, current performance is unfortunately still bad. Started stacking at 18:33 UT, only at 5% by 20:09. when i had to stop it to prevent interference with RMS data capture. Obviously data volumes are a factor here, but on my Windows machine (i7-3770, 8 core, 32GB memory) the same data gets processed in about 45 minutes. UK001L is a Pi4 with 2GB memory, 128GB SD, 20GB free, running Buster 32-bit. UK0006, where it ran fine today, is a Bookworm-64bit build but otherwise identical hardware. The swap is smaller on 1L but i i don't see any signs of excessive memory use so i doubt that would be a factor? I can try a completely clean install of RMS i guess.

Cybis320 commented 1 month ago

So strange. Maybe add a debug print statement in CyFunctions.pyx line 1477:

 if precompute_pointing_corr:
        print("I'M HERE")
        # Correct the pointing for precession (output in radians)
        ra_ref_now_corr, dec_ref_corr, pos_angle_ref_now_corr = pointingCorrection(
            np.mean(jd_data), radians(lat), radians(lon), 
            radians(h0), jd_ref, radians(ra_ref), radians(dec_ref), radians(pos_angle_ref), 
            refraction=refraction
            )

Does it print? That will determine if it's actually running the new code and not some mysteriously cached old version.

dvida commented 1 month ago

Could it be due to SD card degradation?

Cybis320 commented 1 month ago

it's not pulling from master after the reboot, right?

markmac99 commented 1 month ago

@Cybis320 i checked that the branch is correct. I'll amend cyFunctions as suggested and test it again tomorrow. @dvida its possible, though it seems unlikely all three would go bad at the same time. ps - it took 22 mins on my PC, so its definitely fast there!

markmac99 commented 1 month ago

ok, i added that print statement, reran RMS_Update.sh, and the print statement does NOT appear in the output. So, it seems likely RMS is cacheing the code somewhere. I just noticed however that setup.py errors out due to a missing file?

installing package data to build/bdist.linux-armv7l/egg
running install_data
error: can't copy 'share/platepar_templates': doesn't exist or not a regular file

I feel thats probably going to stop RMS updating properly?

markmac99 commented 1 month ago

I think the reason setup is erroring is because platepar_templates is a directory, and setup() expects data_files to be a list of files. I wonder if this is a python or setuptools version issue? i have python 3.7 and setuptools 57.0.0 on these machines.

dvida commented 1 month ago

Great catch, I fixed it now.

markmac99 commented 1 month ago

Yep, it installs now, and looks like TrackStack is working as expected. ! Just ran a test with 66 detections and it ran in 18 minutes, even though RMS is running data capture at the same time. I'll pull the change into the other machines and validate it tomorrow.

Aside: every time i run RMS_Update i think to myself "i really must find time to silence those numpy pointer cast warnings... ..." :)

markmac99 commented 1 month ago

Argh. I just noticed another breaking change has been added.

    # Make sure the paths are directories and not files, by stripping the file name
    dir_paths = [os.path.dirname(dir_path) for dir_path in dir_paths]

This breaks TrackStack because the argument is already directory. With this change, if one passes for example

python -m Utils.TrackStack ~/RMS_data/ArchivedFiles/UK001L_20240810_201546_511219

which is currently the expected format of the argument, then the final part of the path is stripped off causing Trackstack to fail with the error

unable to find a unique platepars file in /home/pi/RMS_data/ArchivedFiles

Looks like this was added yesterday evening. I'm going to comment it out for now as it definitely breaks current usage of trackstack.

markmac99 commented 1 month ago

put in a fix for my previous comment which i think works for both cases.