Open londumas opened 5 years ago
Pre-sorting by TARGETID would make finding a single specific target using numpy.searchsorted an efficient operation, but I don't think it helps much for larger join operations (e.g. with a truth catalog) which generally don't assume the inputs are sorted. Others might want it sorted by other quantities. When we get to the full 5 year catalog, it will be more efficient to be able to append to it in arbitrary order while merging N>>1 individual healpix zbest files, without having to load everything into memory at once, sorting it in place, and then writing it back out.
So, I'm open to considering it, but could you explain more about why you want it sorted by TARGETID? How big a hassle is it to not be pre-sorted? Do others have opinions about sorting by other quantities, or purposefully not sorting?
@sbailey, It is more my OCD talking of having things sorted, instead of in a random and impredictable way. I was not making any statement on efficiency. Anyway, if you think we should not bother, we can close this ticket. As-I said it is not that important.
Another reason to do it is that it is then way more simple to compare mock realization catalogs.
This is a minor request, but useful:
desi_zcatalog
output should be sorted. I think the best is to sort byTARGETID
.