desihub / desispec

DESI spectral pipeline
BSD 3-Clause "New" or "Revised" License
36 stars 24 forks source link

desi_zcatalog output should be sorted #768

Open londumas opened 5 years ago

londumas commented 5 years ago

This is a minor request, but useful: desi_zcatalog output should be sorted. I think the best is to sort by TARGETID.

sbailey commented 5 years ago

Pre-sorting by TARGETID would make finding a single specific target using numpy.searchsorted an efficient operation, but I don't think it helps much for larger join operations (e.g. with a truth catalog) which generally don't assume the inputs are sorted. Others might want it sorted by other quantities. When we get to the full 5 year catalog, it will be more efficient to be able to append to it in arbitrary order while merging N>>1 individual healpix zbest files, without having to load everything into memory at once, sorting it in place, and then writing it back out.

So, I'm open to considering it, but could you explain more about why you want it sorted by TARGETID? How big a hassle is it to not be pre-sorted? Do others have opinions about sorting by other quantities, or purposefully not sorting?

londumas commented 5 years ago

@sbailey, It is more my OCD talking of having things sorted, instead of in a random and impredictable way. I was not making any statement on efficiency. Anyway, if you think we should not bother, we can close this ticket. As-I said it is not that important.

londumas commented 5 years ago

Another reason to do it is that it is then way more simple to compare mock realization catalogs.