Discuss on search functionality

chiahaoliu commented 8 years ago

@sbillinge

As how we deliver search functionality might be a very long thread, I create an issue here.

After taking suggestions from @sbillinge I have an idea on how to integrate search functionality:

search( desired_value, *args, **kwargs):
''' return header(s) that satisfy args = desired_value
arguments:
desired_value - str -

args - str - It is used to specify partial / complete key name we want. If a partial key name is given, 
all possible results will be given.

kwargs - dict - an dictionary that contains exact key pairs user wants look for 
'''

example:
desired_value = 'TiO2'
search(desired_value, *'sa') will return headers that has key starting with 'sa' and its corresponding 
values is 'TiO2'

search (False, **{'sample_name':desired_value, 'additonal_field': 'additional_value ....}) will return
headers that have exactly key pairs **{'sample_name':desired_value, 'additonal_field': 'additional_value ....}

Does it make sense ?

sbillinge commented 8 years ago

yes, this looks like a nice way of doing it. We need to test it to make sure, but I like the way it looks!

S

On Mon, Oct 26, 2015 at 3:10 PM, chiahaoliu notifications@github.com wrote:

@sbillinge https://github.com/sbillinge

As how we deliver search functionality might be a very long thread, I create an issue here.

After taking suggestions from @sbillinge https://github.com/sbillinge I have an idea on how to integrate search functionality:

search( desired_value, _args, *_kwargs): ''' return header(s) that satisfy args = desired_value arguments: desired_value - str -

args - str - It is used to specify partial / complete key name we want. If a partial key name is given, all possible results will be given.

kwargs - dict - an dictionary that contains exact key pairs user wants look for '''

example: desired_value = 'TiO2' search(desired_value, *'sa') will return headers that has key starting with 'sa' and its corresponding values is 'TiO2'

search (False, _{'sample_name':desired_value, 'additonal_field': 'additional_value ....}) will return headers that have exactly key pairs _{'sample_name':desired_value, 'additonal_field': 'additional_value ....}

Does it make sense ?

— Reply to this email directly or view it on GitHub https://github.com/chiahaoliu/xpdAcquireFuncs/issues/12.

Prof. Simon Billinge Applied Physics & Applied Mathematics Columbia University 500 West 120th Street Room 200 Mudd, MC 4701 New York, NY 10027 Tel: (212)-854-2918 (o) 851-7428 (lab)

Condensed Matter Physics and Materials Science Dept. Brookhaven National Laboratory P.O. Box 5000 Upton, NY 11973-5000 (631)-344-5661

email: sb2896 at columbia dot edu home: http:// http://nirt.pa.msu.edu/bgsite.apam.columbia.edu/

chiahaoliu commented 8 years ago

Finished the first version of search function. Will push and test it later this evening.

chiahaoliu commented 8 years ago

@sbillinge

As other functionalities are getting stable. I am thinking to push some efforts on search functionalities. Now I have a thought:

Does it make sense to separate chemical composition and quantity of each element? I found mongoDB search is kinda fuzzy when they search the field is a list. Eg. search on Tim in "experimenters" will result with Tim, [Tim, Simon], [Tim, Simon, Max] and so on. So if we can change a little bit of format you came out and separate chemical composition from composition, we can still have powerful yet general search. Situation would look like: composition = ['Na','Cl'] , quantity = [1,1] when setting up metadata dictionary then search with 'Na' should yield composition = ['Na'] , quantity = [1] composition = ['Na', 'Cl'] , quantity = [1,1] composition = ['Na', 'S', 'O'] , quantity = [1,1,3] ... and so on. In this sense, user should give more inputs ( but highly possible we can find existing packages to slice chemical names), however this ensures a very compact and general search in the future.

Does it sound too complicated ? or I can pack a search specifically aiming on sample composition so that [{'phase1':{'Na':1}},{'phase2':{'Cl':1}}] format is still good for metadata.

sbillinge commented 8 years ago

It sounds fine to make it as lists like that. The greater problem is the possibility of introducing errors. For example, a missed, or permuted number in the composition is easy to do. But later we will have tools to help put these together, which will reduce this problem. I think we can also pull compounds from databases as suggestions which will make it much easier for users. So use the data-structure that makes the most sense on the searching side and we will adapt.

I committed a new version of the acquireFuncs where I changed the docstring on new_sample() as the logic you had used was a bit wrong.

Here, for the compounds (or phases) in our sample, we should use 'phase_name', 'phase_amt', then the sample composition could be {'phase_name','phase_amt','elements','element_amt'}, something like that.

On Wed, Nov 4, 2015 at 12:24 PM, Timothy Liu notifications@github.com wrote:

@sbillinge https://github.com/sbillinge

As other functionalities are getting stable. I am thinking to push some efforts on search functionalities. Now I have a thought:

Does it make sense to separate chemical composition and quantity of each element? I found mongoDB search is kinda fuzzy when they search the field is a list. Eg. search on Tim in "experimenters" will result with Tim, [Tim, Simon], [Tim, Simon, Max] and so on. So if we can change a little bit of format you came out and separate chemical composition from composition, we can still have powerful yet general search. Situation would look like: composition = ['Na','Cl'] , quantity = [1,1] when setting up metadata dictionary then search with 'Na' should yield composition = ['Na'] , quantity = [1] composition = ['Na', 'Cl'] , quantity = [1,1] composition = ['Na', 'S', 'O'] , quantity = [1,1,3] ... and so on. In this sense, user should give more inputs ( but highly possible we can find existing packages to slice chemical names), however this ensures a very compact and general search in the future.

Does it sound too complicated ? or I can pack a search specifically aiming on sample composition so that [{'phase1':{'Na':1}},{'phase2':{'Cl':1}}] format is still good for metadata.

— Reply to this email directly or view it on GitHub https://github.com/chiahaoliu/xpdAcquireFuncs/issues/12#issuecomment-153798753 .

Prof. Simon Billinge Applied Physics & Applied Mathematics Columbia University 500 West 120th Street Room 200 Mudd, MC 4701 New York, NY 10027 Tel: (212)-854-2918 (o) 851-7428 (lab)

Condensed Matter Physics and Materials Science Dept. Brookhaven National Laboratory P.O. Box 5000 Upton, NY 11973-5000 (631)-344-5661

email: sb2896 at columbia dot edu home: http:// http://nirt.pa.msu.edu/bgsite.apam.columbia.edu/

chiahaoliu commented 8 years ago

I found a parsing module 'pyparsing' that seems to be widely used and frequently updated by a group in MIT. I am looking into it and hope that can largely help us on avoiding typo.

pavoljuhas commented 8 years ago

@chiahaoliu - please remove the pyparsing.py file from the repo. pyparsing is included as a standard package in Anaconda and if necessary can be easily installed.

What would you like to do with pyparsing? If it is basic translation of chemical formulas into a species - counts list, there is a function compositional_analysis for that in PDFgetX3. We could reuse it here if suitable:

In [1]: from diffpy.pdfgetx.functs import composition_analysis
In [2]: composition_analysis('NaCl')
Out[2]: (['Na', 'Cl'], [1.0, 1.0])
In [3]: composition_analysis('H2SO4')
Out[3]: (['H', 'S', 'O'], [2.0, 1.0, 4.0])
In [4]: composition_analysis('Sr Ti   O 3')
Out[4]: (['Sr', 'Ti', 'O'], [1.0, 1.0, 3.0])

chiahaoliu commented 8 years ago

yes, that is exactly what I want to do and I also found pypasrsing is already included in conda. I will remove it from repo now

pavoljuhas commented 8 years ago

OK, I will then add the compositional_analysis function to the repo. I will comment here when done.

chiahaoliu commented 8 years ago

Thanks pavol !

sbillinge commented 8 years ago

nice!

On Wed, Nov 4, 2015 at 4:10 PM, Pavol Juhas notifications@github.com wrote:

@chiahaoliu https://github.com/chiahaoliu - please remove the pyparsing.py file from the repo. pyparsing is included as a standard package in Anaconda and if necessary can be easily installed.

What would you like to do with pyparsing? If it is basic translation of chemical formulas into a species - counts list, there is a function compositional_analysis for that in PDFgetX3. We could reuse it here if suitable:

In [1]: from diffpy.pdfgetx.functs import composition_analysis In [2]: composition_analysis('NaCl') Out[2]: (['Na', 'Cl'], [1.0, 1.0]) In [3]: composition_analysis('H2SO4') Out[3]: (['H', 'S', 'O'], [2.0, 1.0, 4.0]) In [4]: composition_analysis('Sr Ti O 3') Out[4]: (['Sr', 'Ti', 'O'], [1.0, 1.0, 3.0])

— Reply to this email directly or view it on GitHub https://github.com/chiahaoliu/xpdAcquireFuncs/issues/12#issuecomment-153866333 .

Prof. Simon Billinge Applied Physics & Applied Mathematics Columbia University 500 West 120th Street Room 200 Mudd, MC 4701 New York, NY 10027 Tel: (212)-854-2918 (o) 851-7428 (lab)

Condensed Matter Physics and Materials Science Dept. Brookhaven National Laboratory P.O. Box 5000 Upton, NY 11973-5000 (631)-344-5661

email: sb2896 at columbia dot edu home: http:// http://nirt.pa.msu.edu/bgsite.apam.columbia.edu/

pavoljuhas commented 8 years ago

Function composition_analysis added in ebcb0091d802ed5c591939b67d958e55b6cf0e27.

Example use:

from xpdacquire.utils import composition_analysis
species, amounts = composition_analysis('CaCO3')

chiahaoliu / xpdAcquireFuncs

Discuss on search functionality #12