compomics / searchgui

Highly adaptable common interface for proteomics search and de novo engines
http://compomics.github.io/projects/searchgui.html
40 stars 16 forks source link

parameter file format #60

Closed wolski closed 8 years ago

wolski commented 8 years ago

This is an improvement suggestion...

It seems that you are using some binary format to store the parameters of search GUI. I can't see a reason for this type of format when the file is just 10K.

hbarsnes commented 8 years ago

The binary format of the parameters file is due to the file simply being the serialized version of the Java search parameter object. Having to write this information to/from text/xml files only adds room for making mistakes. Especially as the amount of information in this file increases with every search engine supported, in addition to containing complex objects like user designed PTMs etc.

wolski commented 8 years ago

Hi,

If I did a search with search gui and someone asks me what parameters did you used should I send him a screenshot of the UI?!

I did not reply for a while because I did not expect such answer. I still do not really know what to say except of the obvious ...

You operate in a context where reproducibility is fundamental. The parameter file needs to be in a format which is humanly readable. It is also there to document what parameters were used for the search. Please also see : https://en.wikipedia.org/wiki/Configuration_file https://en.wikipedia.org/wiki/.properties

Serializing and deserializing java classes to a text based format such as json, yaml, xml is no rocket science! Just do a google search for: best way to serialize and deserialize java classes to json

Stop projecting that users are idiots.

regards

On 30 September 2015 at 18:54, Harald Barsnes notifications@github.com wrote:

The binary format of the parameters file is due to the file simply being the serialized version of the Java search parameter object. Having to write this information to/from text/xml files only adds room for making mistakes. Especially as the amount of information in this file increases with every search engine supported, in addition to containing complex objects like user designed PTMs etc.

— Reply to this email directly or view it on GitHub https://github.com/compomics/searchgui/issues/60#issuecomment-144474582.

Witold Eryk Wolski

mvaudel commented 8 years ago

Dear Witold,

Thank you very much for your suggestion, and as a clarification, once you will get familiar with our work you will see that considering that users are idiots is light years away from our way of working. I understand that our answers can appear as cold and undiplomatic at times, but you have to see that we try to accommodate many users with a limited workforce.

As Harald mentioned, we had to move from text based parameter files to binary files because we store complex objects in there like user modifications or enzyme specifications. These objects have many different attributes and while no rocket science, maintaining an xml export/import scheme for those would have been a tedious task. Also, these conversions are prone to errors endangering the reproducibility. I was not aware of the serialization implementations you point out. Reproducing your google search however points to an old googlecode project which does not seem to be maintained. In general, we try to stay away from non maintained dependencies for obvious reasons.

You mention that your priority is to keep the collaborators informed of your search parameters. I am not sure whether you want to send them an xml file containing the detail of your ~200 search engine parameters along with the design of your PTMs and enzymes? This does not sound like "human readable" to me? As a core facility we often exchange the search parameters of projects and for this, the easiest appeared to be sending the parameters file and opening it in SearchGUI. You will also see that the search report created along with your result files contains a summary of the parameters used for the search, in a human readable format. Finally, if you use PeptideShaker to interpret the results you will be able to export a certificate of analysis which also contains a summary of the parameters used, and even export the text to include in the methods section of a paper including the parameters. If you export your data as mzIdentML, the parameters are reported there as well, and this is the standard way of doing it in the field.

I hope this convinced you that we actually care about these things, and already implemented quite a lot, respecting the standards of the field. If these are not sufficient for you, we can look into other ways of exporting the parameters? Note that this is extra work and maintenance on our side, so we would appreciate help on this - and a less negative/aggressive attitude.

Best regards,

Marc

2015-10-05 16:01 GMT+02:00 Witold Wolski notifications@github.com:

Hi,

If I did a search with search gui and someone asks me what parameters did you used should I send him a screenshot of the UI?!

I did not reply for a while because I did not expect such answer. I still do not really know what to say except of the obvious ...

You operate in a context where reproducibility is fundamental. The parameter file needs to be in a format which is humanly readable. It is there to document what parameters were used for the search. Please also see : https://en.wikipedia.org/wiki/Configuration_file https://en.wikipedia.org/wiki/.properties

Serializing java classes to a text based format such as json, yaml, xml is no rocket science and to point

Just do a google search for: best way to serialize and deserialize java classes to json

Stop projecting that users are idiots.

regards

On 30 September 2015 at 18:54, Harald Barsnes notifications@github.com wrote:

The binary format of the parameters file is due to the file simply being the serialized version of the Java search parameter object. Having to write this information to/from text/xml files only adds room for making mistakes. Especially as the amount of information in this file increases with every search engine supported, in addition to containing complex objects like user designed PTMs etc.

— Reply to this email directly or view it on GitHub <https://github.com/compomics/searchgui/issues/60#issuecomment-144474582 .

Witold Eryk Wolski

— Reply to this email directly or view it on GitHub https://github.com/compomics/searchgui/issues/60#issuecomment-145536602.

wolski commented 8 years ago

Do we really need to discuss why having the parameters in text format would be better for the user?

I have no problem of you saying that we do not have the resources in looking into it. But to say binary is good enough and to close the issue? The issue list on github is not only for bugs but also for improvements. If you are on github than I assumed that you are interested in collaborating and attracting other developers to address open issues, including improvements.

Reproducing your google search however points to an old googlecode project which does not seem to be maintained.

Oh really? Than search-gui was also discontinued? It was previously on google code but it is not anymore... The project you are reffering to was moved to github as search gui is.

Nevertheless, I was NOT pointing to a particular project in the first place. The search does not return just ONE result but 435.000 of them. A brief look in one of the stackoverflow discussion points to two projects: https://github.com/jdereg/json-io https://github.com/google/gson

On 5 October 2015 at 19:38, Marc Vaudel notifications@github.com wrote:

Dear Witold,

Thank you very much for your suggestion, and as a clarification, once you will get familiar with our work you will see that considering that users are idiots is light years away from our way of working. I understand that our answers can appear as cold and undiplomatic at times, but you have to see that we try to accommodate many users with a limited workforce.

As Harald mentioned, we had to move from text based parameter files to binary files because we store complex objects in there like user modifications or enzyme specifications. These objects have many different attributes and while no rocket science, maintaining an xml export/import scheme for those would have been a tedious task. Also, these conversions are prone to errors endangering the reproducibility. I was not aware of the serialization implementations you point out. Reproducing your google search however points to an old googlecode project which does not seem to be maintained. In general, we try to stay away from non maintained dependencies for obvious reasons.

You mention that your priority is to keep the collaborators informed of your search parameters. I am not sure whether you want to send them an xml file containing the detail of your ~200 search engine parameters along with the design of your PTMs and enzymes? This does not sound like "human readable" to me? As a core facility we often exchange the search parameters of projects and for this, the easiest appeared to be sending the parameters file and opening it in SearchGUI. You will also see that the search report created along with your result files contains a summary of the parameters used for the search, in a human readable format. Finally, if you use PeptideShaker to interpret the results you will be able to export a certificate of analysis which also contains a summary of the parameters used, and even export the text to include in the methods section of a paper including the parameters. If you export your data as mzIdentML, the parameters are reported there as well, and this is the standard way of doing it in the field.

I hope this convinced you that we actually care about these things, and already implemented quite a lot, respecting the standards of the field. If these are not sufficient for you, we can look into other ways of exporting the parameters? Note that this is extra work and maintenance on our side, so we would appreciate help on this - and a less negative/aggressive attitude.

Best regards,

Marc

2015-10-05 16:01 GMT+02:00 Witold Wolski notifications@github.com:

Hi,

If I did a search with search gui and someone asks me what parameters did you used should I send him a screenshot of the UI?!

I did not reply for a while because I did not expect such answer. I still do not really know what to say except of the obvious ...

You operate in a context where reproducibility is fundamental. The parameter file needs to be in a format which is humanly readable. It is there to document what parameters were used for the search. Please also see : https://en.wikipedia.org/wiki/Configuration_file https://en.wikipedia.org/wiki/.properties

Serializing java classes to a text based format such as json, yaml, xml is no rocket science and to point

Just do a google search for: best way to serialize and deserialize java classes to json

Stop projecting that users are idiots.

regards

On 30 September 2015 at 18:54, Harald Barsnes notifications@github.com wrote:

The binary format of the parameters file is due to the file simply being the serialized version of the Java search parameter object. Having to write this information to/from text/xml files only adds room for making mistakes. Especially as the amount of information in this file increases with every search engine supported, in addition to containing complex objects like user designed PTMs etc.

— Reply to this email directly or view it on GitHub < https://github.com/compomics/searchgui/issues/60#issuecomment-144474582 .

Witold Eryk Wolski

— Reply to this email directly or view it on GitHub <https://github.com/compomics/searchgui/issues/60#issuecomment-145536602 .

— Reply to this email directly or view it on GitHub https://github.com/compomics/searchgui/issues/60#issuecomment-145608602.

Witold Eryk Wolski

mvaudel commented 8 years ago

Dear Witold,

Concerning googlecode you will notice that we have taken the effort of redirecting the user to updated pages, and I believe the efforts of our team in keeping updated and documented webpage should be acknowledged. You are very correct when mentioning that we are are "interested in collaborating and attracting other developers to address open issues, including improvements". I have a bit of an issue seeing how your comments are going in this direction though?

As said, we would be happy to look into other ways of handling these parameters. However, before we change the structure used by all users and break backward compatibility, we indeed need to discuss whether it is actually better, and worth the effort. We would also need to know to which extend the current solutions are not adapted for your needs. Finally, we need to define who will take care of the implementation and maintenance effort. Do I understand correctly that you are volunteering for this part?

I will be travelling until tomorrow, so no emails until then, but will be looking forward to a constructive and positive answer on this topic :)

Best regards,

Marc

2015-10-05 22:10 GMT+02:00 Witold Wolski notifications@github.com:

Do we really need to discuss why having the parameters in text format would be better for the user?

I have no problem of you saying that we do not have the resources in looking into it. But to say binary is good enough and to close the issue? The issue list on github is not only for bugs but also for improvements. If you are on github than I assumed that you are interested in collaborating and attracting other developers to address open issues, including improvements.

Reproducing your google search however points to an old googlecode project which does not seem to be maintained.

Oh really? Than search-gui was also discontinued? It was previously on google code but it is not anymore... The project you are reffering to was moved to github as search gui is.

Nevertheless, I was NOT pointing to a particular project in the first place. The search does not return just ONE result but 435.000 of them. A brief look in one of the stackoverflow discussion points to two projects: https://github.com/jdereg/json-io https://github.com/google/gson

  • Serializing java classes to something more sensible than a binary dump, is pretty standard nowadays. Automated marshaliing and unamarshalling of java objects to string representations is standard when developing restful API's (and there are plenty of implementations of the java javax-ws-rs api and afaik jaxb )

On 5 October 2015 at 19:38, Marc Vaudel notifications@github.com wrote:

Dear Witold,

Thank you very much for your suggestion, and as a clarification, once you will get familiar with our work you will see that considering that users are idiots is light years away from our way of working. I understand that our answers can appear as cold and undiplomatic at times, but you have to see that we try to accommodate many users with a limited workforce.

As Harald mentioned, we had to move from text based parameter files to binary files because we store complex objects in there like user modifications or enzyme specifications. These objects have many different attributes and while no rocket science, maintaining an xml export/import scheme for those would have been a tedious task. Also, these conversions are prone to errors endangering the reproducibility. I was not aware of the serialization implementations you point out. Reproducing your google search however points to an old googlecode project which does not seem to be maintained. In general, we try to stay away from non maintained dependencies for obvious reasons.

You mention that your priority is to keep the collaborators informed of your search parameters. I am not sure whether you want to send them an xml file containing the detail of your ~200 search engine parameters along with the design of your PTMs and enzymes? This does not sound like "human readable" to me? As a core facility we often exchange the search parameters of projects and for this, the easiest appeared to be sending the parameters file and opening it in SearchGUI. You will also see that the search report created along with your result files contains a summary of the parameters used for the search, in a human readable format. Finally, if you use PeptideShaker to interpret the results you will be able to export a certificate of analysis which also contains a summary of the parameters used, and even export the text to include in the methods section of a paper including the parameters. If you export your data as mzIdentML, the parameters are reported there as well, and this is the standard way of doing it in the field.

I hope this convinced you that we actually care about these things, and already implemented quite a lot, respecting the standards of the field. If these are not sufficient for you, we can look into other ways of exporting the parameters? Note that this is extra work and maintenance on our side, so we would appreciate help on this - and a less negative/aggressive attitude.

Best regards,

Marc

2015-10-05 16:01 GMT+02:00 Witold Wolski notifications@github.com:

Hi,

If I did a search with search gui and someone asks me what parameters did you used should I send him a screenshot of the UI?!

I did not reply for a while because I did not expect such answer. I still do not really know what to say except of the obvious ...

You operate in a context where reproducibility is fundamental. The parameter file needs to be in a format which is humanly readable. It is there to document what parameters were used for the search. Please also see : https://en.wikipedia.org/wiki/Configuration_file https://en.wikipedia.org/wiki/.properties

Serializing java classes to a text based format such as json, yaml, xml is no rocket science and to point

Just do a google search for: best way to serialize and deserialize java classes to json

Stop projecting that users are idiots.

regards

On 30 September 2015 at 18:54, Harald Barsnes < notifications@github.com> wrote:

The binary format of the parameters file is due to the file simply being the serialized version of the Java search parameter object. Having to write this information to/from text/xml files only adds room for making mistakes. Especially as the amount of information in this file increases with every search engine supported, in addition to containing complex objects like user designed PTMs etc.

— Reply to this email directly or view it on GitHub < https://github.com/compomics/searchgui/issues/60#issuecomment-144474582 .

Witold Eryk Wolski

— Reply to this email directly or view it on GitHub < https://github.com/compomics/searchgui/issues/60#issuecomment-145536602 .

— Reply to this email directly or view it on GitHub <https://github.com/compomics/searchgui/issues/60#issuecomment-145608602 .

Witold Eryk Wolski

— Reply to this email directly or view it on GitHub https://github.com/compomics/searchgui/issues/60#issuecomment-145653568.

wolski commented 8 years ago

I am not interested in contributing. I think you first need to open an issue regarding the param file format, define acceptance criteria for such a refactoring, and also provide some general information regarding the current workings, and maybe you will attract a developer.

I was evaluating search gui for a use in a project I am working on.

And for a while it looked actually very good:

Some bioinformaticians consciously following software design principles?!

But than:

parameter file format is text? - NO have you an issue open about it - NO

For me there is nothing to discuss about. This is a blocker. No go. You can do such a hack to get a functional prototype. But with 5+ search engines integrated you are way beyond a functional prototype. So changing the parameter files to a text would be top high priority in my opinion.

So I did open the issue.

And with my hopes high hat you consciously follow software design principles I did expect you leaving it open. But instead I have learned that adding more features (search engines) is more important to you than refactoring software to an improved design.

Now it seems that most of the good design choices in Search GUI are coincidence (you use java - which does much more right than wrong out of the box, and that you use mgf because most of the search engines support it and not because you consider it the least worst choice).

You really need to understand this:

For now I will just write a wrapper to the few search engines I need in my project myself.

On 6 October 2015 at 00:20, Marc Vaudel notifications@github.com wrote:

Dear Witold,

Concerning googlecode you will notice that we have taken the effort of redirecting the user to updated pages, and I believe the efforts of our team in keeping updated and documented webpage should be acknowledged. You are very correct when mentioning that we are are "interested in collaborating and attracting other developers to address open issues, including improvements". I have a bit of an issue seeing how your comments are going in this direction though?

As said, we would be happy to look into other ways of handling these parameters. However, before we change the structure used by all users and break backward compatibility, we indeed need to discuss whether it is actually better, and worth the effort. We would also need to know to which extend the current solutions are not adapted for your needs. Finally, we need to define who will take care of the implementation and maintenance effort. Do I understand correctly that you are volunteering for this part?

I will be travelling until tomorrow, so no emails until then, but will be looking forward to a constructive and positive answer on this topic :)

Best regards,

Marc

2015-10-05 22:10 GMT+02:00 Witold Wolski notifications@github.com:

Do we really need to discuss why having the parameters in text format would be better for the user?

I have no problem of you saying that we do not have the resources in looking into it. But to say binary is good enough and to close the issue? The issue list on github is not only for bugs but also for improvements. If you are on github than I assumed that you are interested in collaborating and attracting other developers to address open issues, including improvements.

Reproducing your google search however points to an old googlecode project which does not seem to be maintained.

Oh really? Than search-gui was also discontinued? It was previously on google code but it is not anymore... The project you are reffering to was moved to github as search gui is.

Nevertheless, I was NOT pointing to a particular project in the first place. The search does not return just ONE result but 435.000 of them. A brief look in one of the stackoverflow discussion points to two projects: https://github.com/jdereg/json-io https://github.com/google/gson

  • Serializing java classes to something more sensible than a binary dump, is pretty standard nowadays. Automated marshaliing and unamarshalling of java objects to string representations is standard when developing restful API's (and there are plenty of implementations of the java javax-ws-rs api and afaik jaxb )

On 5 October 2015 at 19:38, Marc Vaudel notifications@github.com wrote:

Dear Witold,

Thank you very much for your suggestion, and as a clarification, once you will get familiar with our work you will see that considering that users are idiots is light years away from our way of working. I understand that our answers can appear as cold and undiplomatic at times, but you have to see that we try to accommodate many users with a limited workforce.

As Harald mentioned, we had to move from text based parameter files to binary files because we store complex objects in there like user modifications or enzyme specifications. These objects have many different attributes and while no rocket science, maintaining an xml export/import scheme for those would have been a tedious task. Also, these conversions are prone to errors endangering the reproducibility. I was not aware of the serialization implementations you point out. Reproducing your google search however points to an old googlecode project which does not seem to be maintained. In general, we try to stay away from non maintained dependencies for obvious reasons.

You mention that your priority is to keep the collaborators informed of your search parameters. I am not sure whether you want to send them an xml file containing the detail of your ~200 search engine parameters along with the design of your PTMs and enzymes? This does not sound like "human readable" to me? As a core facility we often exchange the search parameters of projects and for this, the easiest appeared to be sending the parameters file and opening it in SearchGUI. You will also see that the search report created along with your result files contains a summary of the parameters used for the search, in a human readable format. Finally, if you use PeptideShaker to interpret the results you will be able to export a certificate of analysis which also contains a summary of the parameters used, and even export the text to include in the methods section of a paper including the parameters. If you export your data as mzIdentML, the parameters are reported there as well, and this is the standard way of doing it in the field.

I hope this convinced you that we actually care about these things, and already implemented quite a lot, respecting the standards of the field. If these are not sufficient for you, we can look into other ways of exporting the parameters? Note that this is extra work and maintenance on our side, so we would appreciate help on this - and a less negative/aggressive attitude.

Best regards,

Marc

2015-10-05 16:01 GMT+02:00 Witold Wolski notifications@github.com:

Hi,

If I did a search with search gui and someone asks me what parameters did you used should I send him a screenshot of the UI?!

I did not reply for a while because I did not expect such answer. I still do not really know what to say except of the obvious ...

You operate in a context where reproducibility is fundamental. The parameter file needs to be in a format which is humanly readable. It is there to document what parameters were used for the search. Please also see : https://en.wikipedia.org/wiki/Configuration_file https://en.wikipedia.org/wiki/.properties

Serializing java classes to a text based format such as json, yaml, xml is no rocket science and to point

Just do a google search for: best way to serialize and deserialize java classes to json

Stop projecting that users are idiots.

regards

On 30 September 2015 at 18:54, Harald Barsnes < notifications@github.com> wrote:

The binary format of the parameters file is due to the file simply being the serialized version of the Java search parameter object. Having to write this information to/from text/xml files only adds room for making mistakes. Especially as the amount of information in this file increases with every search engine supported, in addition to containing complex objects like user designed PTMs etc.

— Reply to this email directly or view it on GitHub <

https://github.com/compomics/searchgui/issues/60#issuecomment-144474582

.

Witold Eryk Wolski

— Reply to this email directly or view it on GitHub < https://github.com/compomics/searchgui/issues/60#issuecomment-145536602 .

— Reply to this email directly or view it on GitHub < https://github.com/compomics/searchgui/issues/60#issuecomment-145608602 .

Witold Eryk Wolski

— Reply to this email directly or view it on GitHub <https://github.com/compomics/searchgui/issues/60#issuecomment-145653568 .

— Reply to this email directly or view it on GitHub https://github.com/compomics/searchgui/issues/60#issuecomment-145686032.

Witold Eryk Wolski

mvaudel commented 8 years ago

Hi again Witold,

You will be happy to see that I reopened the issue as you suggested. As you said, it might attract a developer, and maybe other people could be interested in this discussion.

Thank you very much for the thorough evaluation of our work - and for the lesson on software design principles. You are correct that we are not professional programmers, simply scientists doing the maintenance of bioinformatic tools on our free time. And as such, we accept input in all forms - although some seem to overlook years of hard work and need a bit of self-control to answer politely. I am not sure to which search engine you refer to as “crappy software”, but we will be happy to forward the kind words to the original developers.

Concerning our use of formats, we rely on mgf for historical reasons, but support virtually any format as input thanks to the great work of the Proteowizard team (proteowizard.sourceforge.net). Moving toward mzML, the standard of the field, has long been on our todo list and will happen in the coming months. I am not sure whether one should consider our implementation decisions as pure coincidence though.

You are correct, making more search engines available to all has a higher priority to us than having parameters as a text file, and we apologize for it. As already mentioned, our original search parameters were in text, but with the growing list of search engine parameters (~200 at time of writing) and the need to store complex objects, it has rapidly become cumbersome to maintain. Instead, we chose to let the users create, edit, and visualize the parameters in our software. We convey them along with the search results, in binary as well as text in the search report, and allow their export in different formats from PeptideShaker, notably including them in the mzIdentML export which is the standard of the field. So far our users have been happy with this option, but as said we are open to improvement on this. I am not aware of any standard format for search parameters. On this topic you might want to contact the HUPO-PSI (www.psidev.info). I have looked at exports in json with a more experienced programmer, it looks feasible but we need to verify the handling of backward compatibility before moving to this format.

Sorry to read that using our tool is a no go for you given this limitation in our design. A workaround might be to use a galaxy setup, where things are implemented with high programming standards (usegalaxyp.org). Also, the good news is that other academic groups have developed software handling search engines, you might want to check the Trans Proteomic Pipeline (TPP - tools.proteomecenter.org), OpenMS (www.openms.de), the Tabb Lab software ( https://my.vanderbilt.edu/liebler/technology/informatics-pipeline-for-proteomics), Proline (http://proline.profiproteomics.fr), software of the Beijing Genomics Institute (www.genomics.cn), of Andy Jones’ group ( http://pcwww.liv.ac.uk/~jonesar/research.html), and Bioconductor ( www.bioconductor.org). Apologizes for the ones I forgot here, if you are aware of other pieces of academic software for this we will be happy to know. You might also consider using commercial software which will provide you with professional standards.

Good luck writing a wrapper for the search engines yourself, it would be great if you could keep us posted on this, so that we can have a view at how this can be handled with high software design principles. By looking at your GitHub account, it seems that we have very different evaluation criteria for bioinformatic software, and I am sure that both sides have something to gain from each other’s.

Best regards,

Marc

2015-10-07 10:24 GMT+02:00 Witold Wolski notifications@github.com:

I am not interested in contributing. I think you first need to open an issue regarding the param file format, define acceptance criteria for such a refactoring, and also provide some general information regarding the current workings, and maybe you will attract a developer.

I was evaluating search gui for a use in a project I am working on.

And for a while it looked actually very good:

  • open source? - Yes
  • Is it written in a reasonable programming language? - Yes
  • Can I easily build it in an IDE with good tooling to debug and fix potential errors? - Yes
  • Modularisation - seems to be existent - Yes
  • platform independent - Yes
  • does it work? Yes - Mostly, and actually more than I would expect considering all the crappy software you try to integrate
  • binary file formats for large data? NO - but this is the problem of this whole academic MS community and not of Search GUI in particular ... and out of all the crazy text formats you use mgf (it's reasonable).

Some bioinformaticians consciously following software design principles?!

But than:

parameter file format Binary format - NO. have you an issue about it - NO

For me there is nothing to discuss about. This is a blocker. No go. You can do such a hack to get a functional prototype. But with 5+ search engines integrated you are way beyond a functional prototype. So changing the parameter files to a text would be top high priority in my opinion.

So I did open the issue.

And with my hopes high hat you consciously follow software design principles I did expect you leaving it open. But instead I have learned that adding more features (search engines) is more important to you than refactoring software to an improved design.

Now it seems that most of the good design choices in Search GUI are coincidence (you use java - which does much more right than wrong out of the box, and that you use mgf because most of the search engines support it and not because you consider it the least worst choice).

You really need to understand this:

  • Serialize small data in some well annotated easily readable text if it is not confidential. Serialize big data in some standardized binary format.

For now I will just write a wrapper to the few search engines myself.

On 6 October 2015 at 00:20, Marc Vaudel notifications@github.com wrote:

Dear Witold,

Concerning googlecode you will notice that we have taken the effort of redirecting the user to updated pages, and I believe the efforts of our team in keeping updated and documented webpage should be acknowledged. You are very correct when mentioning that we are are "interested in collaborating and attracting other developers to address open issues, including improvements". I have a bit of an issue seeing how your comments are going in this direction though?

As said, we would be happy to look into other ways of handling these parameters. However, before we change the structure used by all users and break backward compatibility, we indeed need to discuss whether it is actually better, and worth the effort. We would also need to know to which extend the current solutions are not adapted for your needs. Finally, we need to define who will take care of the implementation and maintenance effort. Do I understand correctly that you are volunteering for this part?

I will be travelling until tomorrow, so no emails until then, but will be looking forward to a constructive and positive answer on this topic :)

Best regards,

Marc

2015-10-05 22:10 GMT+02:00 Witold Wolski notifications@github.com:

Do we really need to discuss why having the parameters in text format would be better for the user?

I have no problem of you saying that we do not have the resources in looking into it. But to say binary is good enough and to close the issue? The issue list on github is not only for bugs but also for improvements. If you are on github than I assumed that you are interested in collaborating and attracting other developers to address open issues, including improvements.

Reproducing your google search however points to an old googlecode project which does not seem to be maintained.

Oh really? Than search-gui was also discontinued? It was previously on google code but it is not anymore... The project you are reffering to was moved to github as search gui is.

Nevertheless, I was NOT pointing to a particular project in the first place. The search does not return just ONE result but 435.000 of them. A brief look in one of the stackoverflow discussion points to two projects: https://github.com/jdereg/json-io https://github.com/google/gson

  • Serializing java classes to something more sensible than a binary dump, is pretty standard nowadays. Automated marshaliing and unamarshalling of java objects to string representations is standard when developing restful API's (and there are plenty of implementations of the java javax-ws-rs api and afaik jaxb )

On 5 October 2015 at 19:38, Marc Vaudel notifications@github.com wrote:

Dear Witold,

Thank you very much for your suggestion, and as a clarification, once you will get familiar with our work you will see that considering that users are idiots is light years away from our way of working. I understand that our answers can appear as cold and undiplomatic at times, but you have to see that we try to accommodate many users with a limited workforce.

As Harald mentioned, we had to move from text based parameter files to binary files because we store complex objects in there like user modifications or enzyme specifications. These objects have many different attributes and while no rocket science, maintaining an xml export/import scheme for those would have been a tedious task. Also, these conversions are prone to errors endangering the reproducibility. I was not aware of the serialization implementations you point out. Reproducing your google search however points to an old googlecode project which does not seem to be maintained. In general, we try to stay away from non maintained dependencies for obvious reasons.

You mention that your priority is to keep the collaborators informed of your search parameters. I am not sure whether you want to send them an xml file containing the detail of your ~200 search engine parameters along with the design of your PTMs and enzymes? This does not sound like "human readable" to me? As a core facility we often exchange the search parameters of projects and for this, the easiest appeared to be sending the parameters file and opening it in SearchGUI. You will also see that the search report created along with your result files contains a summary of the parameters used for the search, in a human readable format. Finally, if you use PeptideShaker to interpret the results you will be able to export a certificate of analysis which also contains a summary of the parameters used, and even export the text to include in the methods section of a paper including the parameters. If you export your data as mzIdentML, the parameters are reported there as well, and this is the standard way of doing it in the field.

I hope this convinced you that we actually care about these things, and already implemented quite a lot, respecting the standards of the field. If these are not sufficient for you, we can look into other ways of exporting the parameters? Note that this is extra work and maintenance on our side, so we would appreciate help on this - and a less negative/aggressive attitude.

Best regards,

Marc

2015-10-05 16:01 GMT+02:00 Witold Wolski notifications@github.com:

Hi,

If I did a search with search gui and someone asks me what parameters did you used should I send him a screenshot of the UI?!

I did not reply for a while because I did not expect such answer. I still do not really know what to say except of the obvious ...

You operate in a context where reproducibility is fundamental. The parameter file needs to be in a format which is humanly readable. It is there to document what parameters were used for the search. Please also see : https://en.wikipedia.org/wiki/Configuration_file https://en.wikipedia.org/wiki/.properties

Serializing java classes to a text based format such as json, yaml, xml is no rocket science and to point

Just do a google search for: best way to serialize and deserialize java classes to json

Stop projecting that users are idiots.

regards

On 30 September 2015 at 18:54, Harald Barsnes < notifications@github.com> wrote:

The binary format of the parameters file is due to the file simply being the serialized version of the Java search parameter object. Having to write this information to/from text/xml files only adds room for making mistakes. Especially as the amount of information in this file increases with every search engine supported, in addition to containing complex objects like user designed PTMs etc.

— Reply to this email directly or view it on GitHub <

https://github.com/compomics/searchgui/issues/60#issuecomment-144474582

.

Witold Eryk Wolski

— Reply to this email directly or view it on GitHub <

https://github.com/compomics/searchgui/issues/60#issuecomment-145536602

.

— Reply to this email directly or view it on GitHub < https://github.com/compomics/searchgui/issues/60#issuecomment-145608602 .

Witold Eryk Wolski

— Reply to this email directly or view it on GitHub < https://github.com/compomics/searchgui/issues/60#issuecomment-145653568 .

— Reply to this email directly or view it on GitHub <https://github.com/compomics/searchgui/issues/60#issuecomment-145686032 .

Witold Eryk Wolski

— Reply to this email directly or view it on GitHub https://github.com/compomics/searchgui/issues/60#issuecomment-146112478.

mvaudel commented 8 years ago

Note:

The question in the initial post “I can't see a reason for this type of format” was correctly answered by @hbarsnes, and the issue was thus closed. After the discussion above, it seems that the subject of this issue was actually a suggestion to replace the binary parameter files by a text based format - which was not explicitly asked in the original post. I thus reopen the issue and change its label to “enhancement”.

The following should discuss the acceptance criteria for such a refactoring, as suggested by @Wolski, as well as its practical implementation.

hbarsnes commented 8 years ago

Hi Witold,

I'm sorry if my initial reply to your question could be understood as if we didn't take it or our user's needs seriously. This was definitely not my intention. And I'm also sorry that I closed the issue too quickly. Keep in mind that we get lots of questions every week and it can happen that we sometimes answer too quickly without fully having understood the finer details of the question. In such cases we however will always re-open the issue, if the person reporting the issue contributes further details (as has been the case for your issue).

What I should have included in my initial reply that could perhaps have cleared up some of the confusion, is that we never considered the serialized search parameter object as an external file format, but rather as our internal way of storing and transferring these parameters between our own tools. We therefore did not consider that other people would have the need to read the content of these files directly. For this purpose we have other methods as Marc has already mentioned (plus that many of the search engines store the search parameters in their own output files).

Additionally, if we talk about the search parameters in general, I would say that we have (at least) two different use cases when it comes to access: i) "normal" user access, and ii) programmer access. For the first I would claim that what we already have in place is more than adequate and provides the user with various ways of displaying and interacting with the search parameters? And having the files in an XML-based format would not really make any difference here, as XML is not for the average software user anyway? For the latter however, I agree that the current solution is perhaps not optimal. But then again, the current search parameter object was never intended for external use, and has been working fine for our internal use for several years. We have therefore not had any incentive to change the format until now. I would also like to add that we already have numerous other users using our software in their own pipelines, and for each of these we have made minor changes and improvements to our structure to accommodate their needs. But as far as I can remember, none of them have complained about the format of the search parameters.

Regarding changing the search parameter file into an XML-based structure, what do you see as the main benefit on your side? And how would you be using these files? In other words, in what way is the current format (and the way we use it) specifically stopping you from using our tools? Furthermore, what is your main objective? Do you only want to run the searches, or also process and combine the results?

Finally, it is of course your choice if you end up not using our tools and rather develop your own wrappers, but it would be too bad to have to redo our more than five years of work just because of this one minor issue with the way we store the search parameters? Especially given that we are more than happy to consider improving/extending this detail if we can arrive at a new structure that we can all agree is better?

Best regards, Harald

wolski commented 8 years ago

Thanks for your replies. I was missing some parts of the 'picture' when I was opening the issue.

I did not expect that well below all the rather irrelevant progress indicators from the search engines, you print the parameters for the search engines SearchGuiResult .. html. I rate this information among the most relevant so I did expect it in the beginning of the report. Some further suggestion - just print the parameters of the engines which were actually used. This alleviates the need for a humanly readable par file.

Still I think, that it is a good idea to move to a text based serialization format for your parameter objects. (@Harald I never did mention xml as my first choice. On the contrary, I would prefer yaml but have no clue how good is java object serialization to this format is - json is OK). I just recently was involved in refactoring of classes, instances of which were serialized to binary blobs for data persistence (config's as in you case). I do not need this, some might.

Some more issues:

logging and what to log (major)

There is no other log file except of the SearchGuiResult? All caught exception of Search GUI go to out - So there is no logging in search-gui?

What I definitely need logged and also somehow placed in the results zip folder are:

The aim obviously is to be able debug the search engines in case of failure directly without the plumbing. Debugging the search engines is not an academic problem.

mzML (major)

@Marc You are mentioning that you are planning to move to mzML. Does this mean that you drop direct mgf file support? NO "SearchCLI -spectrum_files *mgf" in the future?

mgf's are a KISS. MGF's are faster to read, faster to write, smaller when compressed etc. than mzML or mzXML.

If can't use vendor file formats I prefer mgf (mz5 would be OK too) while mzML and mzXML would be my least preferred choice. I will be converting the files from vendor formats on one system (windows) on the fly and than stage them for searching to a unix cluster.

psidev.info - knock, knock, Oh it's ivory (tower). /\/\/\/\

I do really prefer to reuse existing code or software if possible. I definitely will have a look at the alternatives you have listed.

regards

On 7 October 2015 at 15:05, Harald Barsnes notifications@github.com wrote:

Hi Witold,

I'm sorry if my initial reply to your question could be understood as if we didn't take it or our user's needs seriously. This was definitely not my intention. And I'm also sorry that I closed the issue too quickly. Keep in mind that we get lots of questions every week and it can happen that we sometimes answer too quickly without fully having understood the finer details of the question. In such cases we however will always re-open the issue, if the person reporting the issue contributes further details (as has been the case for your issue).

What I should have included in my initial reply that could perhaps have cleared up some of the confusion, is that we never considered the serialized search parameter object as an external file format, but rather as our internal way of storing and transferring these parameters between our own tools. We therefore did not consider that other people would have the need to read the content of these files directly. For this purpose we have other methods as Marc has already mentioned (plus that many of the search engines store the search parameters in their own output files).

Additionally, if we talk about the search parameters in general, I would say that we have (at least) two different use cases when it comes to access: i) "normal" user access, and ii) programmer access. For the first I would claim that what we already have in place is more than adequate and provides the user with various ways of displaying and interacting with the search parameters? And having the files in an XML-based format would not really make any difference here, as XML is not for the average software user anyway? For the latter however, I agree that the current solution is perhaps not optimal. But then again, the current search parameter object was never intended for external use, and has been working fine for our internal use for several years. We have therefore not had any incentive to change the format until now. I would also like to add that we already have numerous other users using our software in their own pipelines, and for each of these we have made minor changes and improvements to our structure to accommodate their needs. But as far as I can remember, none of them have complained about the format of the search parameters.

Regarding changing the search parameter file into an XML-based structure, what do you see as the main benefit on your side? And how would you be using these files? In other words, in what way is the current format (and the way we use it) specifically stopping you from using our tools? Furthermore, what is your main objective? Do you only want to run the searches, or also process and combine the results?

Finally, it is of course your choice if you end up not using our tools and rather develop your own wrappers, but it would be too bad to have to redo our more than five years of work just because of this one minor issue with the way we store the search parameters? Especially given that we are more than happy to consider improving/extending this detail if we can arrive at a new structure that we can all agree is better?

Best regards, Harald

— Reply to this email directly or view it on GitHub https://github.com/compomics/searchgui/issues/60#issuecomment-146190646.

Witold Eryk Wolski

mvaudel commented 8 years ago

Dear Witold,

Thank you for your suggestions, we appreciate this detailed and constructive feedback, it will help us improving the tools. Point by point answers follow, don’t hesitate to contact us again if something remains unclear, or if we misunderstood your point.

Best regards,

Marc

1- Parameters are at the end of the html file You are correct and it makes sense to have the parameters first because the input will depend on them. Now we also need to make sure that the user can see right away how the progress went, and do not need to scroll down all parameters. We will look into improving this.

2- Just print the parameters of the engines which were actually used I am not sure I understand. All the parameters listed are actually used.

3- Support non java format for serialized parameters We are currently testing different options for this and will keep you posted with our progress.

4- Logging There is a log file containing the exceptions, all command lines, and logs of search engines in the ‘resources’ folder of the tool. You can also access it via the “Help” -> “Bug Report” menu of the interface. I understand that this solution is not sufficient for many systems, and will look into a way of providing a log output and will include there all configuration files of third party tools for debugging purposes.

5- mzML We have no intention of dropping the mgf file support and ‘–spectrum_files *mgf’ will stay. We however need survey spectra in some other applications and support of mzML in our back-end will de facto result in support in SearchGUI. mzML is the standard of the field so it is good to support it anyway.

wolski commented 8 years ago

Hi,

Thanks for all the information and you effort to accommodate user request. As you will notice, I am now testing SearchGUI and will unfortunately need to post a few issues soon.

If it is up to me we can close the issue regarding the parameter files.

Thanks a lot.

mvaudel commented 8 years ago

Dear Witold,

We are making good progresses with the parameters files, notably thanks to @kverhegg and @nielshulstaert, and would thus like to keep the issue open until a fix has been released. We should have something ready for you to test ready by next week already!

Best regards,

Marc

wolski commented 8 years ago

Regarding point 2 from you previous mail.

What I mean is: I for instance only enabled comet and myrimatch (so only these search engines do a search) but still the parameter for xtandem, msamanda, omssa etc. etc. are printed into the SearchGUIResult.html .

wolski commented 8 years ago

BDW.... Since we keep the issue of the parameter files open. One more suggestion: would it be possible to have a "save parameter as" button in the search GUI. Currently if I load a parameter file and modify It it will be overwritten by default with the new options. Sometimes I have the need to change only a few parameters and save it as a new search configuration.

Since you mentioned your plans regarding adding mzML support - why you don't open an issue regarding this plans? I am asking because I was wondering which API you are planning to use for reading mzML?

mvaudel commented 8 years ago

Hi again,

Concerning the printing of the parameters of search engines which are not used, you are correct, this is a mistake on our side. We will correct this.

Concerning the 'save as' button, we had it in earlier versions, and it got removed as the gui was getting more complex. Instead, we added this system where if you change a parameter, the tool will ask you where you want to save your changes. I don't know to which extend we can add a save as button without overcrowding the gui, @hbarsnes will have more input on this.

Concerning the mzML issue, we did not open any issue because we never got the time to work on this yet ;)

Best regards,

Marc

wolski commented 8 years ago

Hi Marc,

You are correct.... I did miss the Overwrite current settings file? Dialog. Thanks.

regards

On 15 October 2015 at 10:35, Marc Vaudel notifications@github.com wrote:

Hi again,

Concerning the printing of the parameters of search engines which are not used, you are correct, this is a mistake on our side. We will correct this.

Concerning the 'save as' button, we had it in earlier versions, and it got removed as the gui was getting more complex. Instead, we added this system where if you change a parameter, the tool will ask you where you want to save your changes. I don't know to which extend we can add a save as button without overcrowding the gui, @hbarsnes https://github.com/hbarsnes will have more input on this.

Concerning the mzML issue, we did not open any issue because we never got the time to work on this yet ;)

Best regards,

Marc

— Reply to this email directly or view it on GitHub https://github.com/compomics/searchgui/issues/60#issuecomment-148317729.

Witold Eryk Wolski

hbarsnes commented 8 years ago

Hi Witold,

With the releases of SearchGUI 2.2.0 and PeptideShaker 1.2.0 we now use the json format for our search parameters, i.e., the search parameters are no longer in a binary format.

Please give it a go and let us know if we can proceed to close the issue.

Best regards, Harald

mvaudel commented 8 years ago

Dear Witold,

The command line instructions have now been updated according to the new handling of parameters: http://compomics.github.io/projects/searchgui.html http://compomics.github.io/peptide-shaker/wiki/peptideshakercli.html http://compomics.github.io/compomics-utilities/wiki/identificationparameterscli.html

I will now look into the logging as you suggested.

Best regards,

Marc

mvaudel commented 8 years ago

Dear Witold,

We have just released new versions of SearchGUI and PeptideShaker where a '-log' option has been implemented in all command lines. This option redirects all reports and logs to the folder provided by the user, see the respective command line documentation.

If you encounter problems related to this, don't hesitate to contact us again and we will reopen the issue.

Best regards and best wishes for the end of the year celebrations,

Marc