GreenBankObservatory / dysh

https://dysh.readthedocs.io
Other
9 stars 3 forks source link

`GBTFITSLoad.write` scrambles scan order #342

Open astrofle opened 2 months ago

astrofle commented 2 months ago

Describe the bug When writing a subset of an SDFITS file the order of the scans should be preserved. It was not for one of the output files.

How to Reproduce

from dysh.fits.gbtfitsload import GBTFITSLoad
filename = '/home/sdfits/AGBT24A_329_02/AGBT24A_329_02.raw.vegas'
sdfits = GBTFITSLoad(filename)
sdfits.write("AGBT24A_329_02.raw.sub.vegas.fits", plnum=1, intnum=0, overwrite=False)
sdf = GBTFITSLoad("AGBT24A_329_02.raw.sub.vegas4.fits")
sdf.summary()

The scan order is: 8,1,2,7.

For the first file written, the scan order is fine.

Environment

teuben commented 1 week ago

I also ran into this in the nodding testing. It cause the results of getps() to be not matching that of the original data.

teuben commented 1 week ago

The solution would probably be to create a new column that's the original row number, than selections may mess this up, but sorting the table by the row number would fix this problem. At a cost of keeping two copies in memory?

teuben commented 1 week ago

maybe this helps?

https://stackoverflow.com/questions/56658723/how-to-maintain-order-when-selecting-rows-in-pandas-dataframe

teuben commented 1 week ago

I think I fixed this in the nodding2 branch, as said. A simple rows.sort() was needed after the rows to be written were assembled.

mpound commented 1 week ago

There is already a row number column added in construction.

teuben commented 1 week ago

but the row number is not global, they seem to go by fits file in a multi-fits object. haven't checked for multi-hdu