EranOfek / AstroPack

Astronomy & Astrophysics Software Pacakge
Other
16 stars 4 forks source link

Asteroid output data is huge and takes much time to write #454

Closed agioffe closed 1 month ago

agioffe commented 1 month ago

Writing of asteroid data in the pipeline takes about 20 minutes, the output object size is ~ 20 Gb:

18:50:51.787 [INF] WRX80: pipline.DemonLAST finished saving Merged Cats and Matched sources for group 1 / RunTime: 573.4 19:13:51.771 [INF] WRX80: pipline.DemonLAST finished saving Asteroid data for group 1 / RunTime: 1953.4

$ du -sh 222635v3/* |grep G 23G 222635v3/LAST.01.08.03_20230616.222625.384_clear_346+79_001_001_001_sci_merged_Asteroids_1.mat $

The dataset is the standard visit employed by pipeline's unit test at pipeline.DemonLAST.unitTest: 20 images starting from LAST.01.08.03_20230616.222625.384

EranOfek commented 1 month ago

please provide details. E.g., where can we found the specific dataset that was used for this test.

On Thu, Jun 6, 2024 at 9:06 AM Alexander Krassilchtchikov < @.***> wrote:

Writing of asteroid data in the pipeline takes about 20 minutes, the output object size is ~ 20 Gb:

18:50:51.787 [INF] WRX80: pipline.DemonLAST finished saving Merged Cats and Matched sources for group 1 / RunTime: 573.4 19:13:51.771 [INF] WRX80: pipline.DemonLAST finished saving Asteroid data for group 1 / RunTime: 1953.4

@.:/matlab/data/pipeline/LAST/2023/06/16/proc$ du -sh 222635v3/ |grep G 23G 222635v3/LAST.01.08.03_20230616.222625.384_clear_346+79_001_001_001_sci_merged_Asteroids_1.mat **@.***:/matlab/data/pipeline/LAST/2023/06/16/proc$

— Reply to this email directly, view it on GitHub https://github.com/EranOfek/AstroPack/issues/454, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJUQ4NB3Q5GJD2EK4WTKW3ZF736ZAVCNFSM6AAAAABI4A4LHSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGMZTOMZYGQ2DGNY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

agioffe commented 1 month ago

please provide details. E.g., where can we found the specific dataset that was used for this test. On Thu, Jun 6, 2024 at 9:06 AM Alexander Krassilchtchikov < @.> wrote: Writing of asteroid data in the pipeline takes about 20 minutes, the output object size is ~ 20 Gb: 18:50:51.787 [INF] WRX80: pipline.DemonLAST finished saving Merged Cats and Matched sources for group 1 / RunTime: 573.4 19:13:51.771 [INF] WRX80: pipline.DemonLAST finished saving Asteroid data for group 1 / RunTime: 1953.4 @.:/matlab/data/pipeline/LAST/2023/06/16/proc$ du -sh 222635v3/* |grep G 23G 222635v3/LAST.01.08.03_20230616.222625.384_clear_346+79_001_001_001_sci_merged_Asteroids_1.mat @.:/matlab/data/pipeline/LAST/2023/06/16/proc$ — Reply to this email directly, view it on GitHub <#454>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJUQ4NB3Q5GJD2EK4WTKW3ZF736ZAVCNFSM6AAAAABI4A4LHSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGMZTOMZYGQ2DGNY . You are receiving this because you are subscribed to this thread.Message ID: @.>

Thanks, updated above.

EranOfek commented 1 month ago

I meant where do you get the visit raw images from?

On Thu, Jun 6, 2024 at 1:54 PM Alexander Krassilchtchikov < @.***> wrote:

please provide details. E.g., where can we found the specific dataset that was used for this test. … <#m9007825969221090358> On Thu, Jun 6, 2024 at 9:06 AM Alexander Krassilchtchikov < @.> wrote: Writing of asteroid data in the pipeline takes about 20 minutes, the output object size is ~ 20 Gb: 18:50:51.787 [INF] WRX80: pipline.DemonLAST finished saving Merged Cats and Matched sources for group 1 / RunTime: 573.4 19:13:51.771 [INF] WRX80: pipline.DemonLAST finished saving Asteroid data for group 1 / RunTime: 1953.4 @.:/matlab/data/pipeline/LAST/2023/06/16/proc$ du -sh 222635v3/ |grep G 23G 222635v3/LAST.01.08.03_20230616.222625.384_clear_346+79_001_001_001_sci_merged_Asteroids_1.mat @.:/matlab/data/pipeline/LAST/2023/06/16/proc$ — Reply to this email directly, view it on GitHub <#454 https://github.com/EranOfek/AstroPack/issues/454>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJUQ4NB3Q5GJD2EK4WTKW3ZF736ZAVCNFSM6AAAAABI4A4LHSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGMZTOMZYGQ2DGNY https://github.com/notifications/unsubscribe-auth/ABJUQ4NB3Q5GJD2EK4WTKW3ZF736ZAVCNFSM6AAAAABI4A4LHSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGMZTOMZYGQ2DGNY . You are receiving this because you are subscribed to this thread.Message ID: @.*>

Thanks, updated above.

— Reply to this email directly, view it on GitHub https://github.com/EranOfek/AstroPack/issues/454#issuecomment-2152005432, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJUQ4MNNIKHUSRNJBI73XLZGA5XTAVCNFSM6AAAAABI4A4LHSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJSGAYDKNBTGI . You are receiving this because you commented.Message ID: @.***>

agioffe commented 1 month ago

'LASTpipelineUnitTest'

EranOfek commented 1 month ago

I suspect this is cause by a change that I did recently:

in pipeline.generic.multiRaw2procCoadd: Args.MergedMatchMergedCat logical = false;

and pipeline.generic.procMergedCoadd Args.MergedMatchMergedCat logical = false;

Chaned it back to true - the idea is that it will remove most of the asteroids false alarms. @Shasha: Please test with new version.

agioffe commented 1 month ago

Thanks, will check and report here.

agioffe commented 1 month ago

Yes, this cured both the size of the output and the processing speed of the whole pipeline, which went back to 380~s total, with all the disk writes and DB interactions.