Unexpected behaviour when querying by identifier

askap-vast / vast-tools

A collection of tools that are useful for the VAST project and for exploration of results from the VAST Pipeline.

https://vast-survey.org/vast-tools/

MIT License

8 stars 0 forks source link

Unexpected behaviour when querying by identifier #250

Closed ddobie closed 3 years ago

ddobie commented 3 years ago

Source names are replaced with the standard SIMBAD name when querying by identifier, e.g.

source_names = ["1AXG J134412+0016", "SN 2012dy", "PSR J2129-0429"]

example_query = Query(
    source_names=source_names,
    matches_only=True, 
    epochs="all-vast", 
    crossmatch_radius=10.,
)

example_query.find_sources()

example_query.source_names

returns

['2SLAQ J134414.15+001642.2', 'SN 2012dy', 'PSR J2129-04']

I think it's important to keep those names available to the user, but the names used in the original query should be the primary reference. Should just be a matter of changing Query to not overwrite the source_names here

ddobie commented 3 years ago

As an aside, this means source-search-example.ipynb is currently broken.

ajstewart commented 3 years ago

I think I made a conscious decision about this when writing it, my thought process was that there was no feedback as to what SIMBAD had actually found, i.e. did the input find the source you were expecting? Hence I decided to use the SIMBAD names was then you can see exactly what it has found.

ajstewart commented 3 years ago

As an aside, this means source-search-example.ipynb is currently broken.

On Nimbus it is working ok, has master regressed to break it?

ddobie commented 3 years ago

As an aside, this means source-search-example.ipynb is currently broken.

On Nimbus it is working ok, has master regressed to break it?

Ah sorry, you're right. I'd just copied snippets from it. It currently works because you initially set example_query in [2] and overwrite it in [4]. If you skip running [4] (i.e. just run a query on source names) then it breaks at [11].

ddobie commented 3 years ago

I think I made a conscious decision about this when writing it, my thought process was that there was no feedback as to what SIMBAD had actually found, i.e. did the input find the source you were expecting? Hence I decided to use the SIMBAD names was then you can see exactly what it has found.

Agreed that it makes sense to be able to check exactly what your query has found, but I'd rather add that as a new column than overwrite the user-defined source names. At the moment the user can't loop over the source names they've defined, e.g.

source_names = ["1AXG J134412+0016", "SN 2012dy", "PSR J2129-0429"]

example_query = Query(source_names=source_names)

for source in source_names:
    print(example_query.results[source])

I guess the workaround to that would be looping over example_query.results.index, but it's not entirely obvious.

The other issue I can think of is if the user wants to query a list of source names, write the results to file and use that file in another script. They won't be able to simply match on the source names.

ajstewart commented 3 years ago

I can see the confusion, maybe add a SIMBAD name column when using the SIMBAD search - though I think this column will need to be accounted for through the code.

ddobie commented 3 years ago

Turns out this is trickier to implement than you'd expect.

For whatever reason Simbad.query_objects() doesn't return anything that says which queried objects it's returning, i.e. if you query 3 objects and only 1 has a simbad match, there's no way of knowing which one it is.

ajstewart commented 3 years ago

Turns out this is trickier to implement than you'd expect.

For whatever reason Simbad.query_objects() doesn't return anything that says which queried objects it's returning, i.e. if you query 3 objects and only 1 has a simbad match, there's no way of knowing which one it is.

Ah yep I remember that, I wondered if this also factored into this decision. There is a small note in the astroquery docs that this may be added but is not currently there. You'd have to match using the ra dec.

ddobie commented 3 years ago

Out of curiousity I just tested query_objects vs a for loop with a query for each for a list of 100 objects and the limiting factor seems to be the server response time. The run time of query_objects ranges from 0.08-8 seconds and the loop ranges from 0.5-15 seconds.

I'll leave the code as-is for now, but leave the issue open for if/when astroquery gets updated

ajstewart commented 3 years ago

Out of curiousity I just tested query_objects vs a for loop with a query for each for a list of 100 objects and the limiting factor seems to be the server response time. The run time of query_objects ranges from 0.08-8 seconds and the loop ranges from 0.5-15 seconds.

I'll leave the code as-is for now, but leave the issue open for if/when astroquery gets updated

I think Simbad temporary blocks you if you make too many requests in a short amount of time.

joshoewahp commented 3 years ago

With astroquery you can add the 'typed_id' field with: Simbad.add_votable_fields('typed_id')

which will include the name used for the search in a separate field. Then you can use vectorised queries to avoid looping / request limits.

I think this addresses the issues raised here right?