CoastSeg manuscript review

ping #https://github.com/openjournals/joss-reviews/issues/6683

Summary

Fitzpatrick et al. (2024) have made significant effort in improving community-driven satellite-derived shoreline mapping. The authors have refactored the well-established CoastSat software package, resulting in more robust and maintainable code. Additionally, they have developed Jupyter widgets that simplify shoreline mapping for less-technical users by providing them with a user interface. Furthermore, they provide access to a deep learning model trained for image segmentation.

Their efforts to streamline desktop shoreline mapping, a practice now standard in coastal science, are new and likely to make this package widely used. The authors' active engagement with the open-source coastal community further also underscores commitment to maintain this software in the future. I have reviewed the software, and most of it functions effectively. I would like to thank Fitzpatrick and Buscombe for their prompt responses on GitHub, ensuring the software also worked on my machine.

However, I do have a few comments regarding the manuscript and recommend addressing these before publishing.

Software capabilities

While the manuscript presents the software as capable of landcover mapping, it primarily focuses on satellite-derived shorelines (SDS). The landcover mapping functionality appears to be less developed and undocumented in comparison to SDS capabilities. To manage expectations, I would recommend to clearly convey this to the reader, so by not specifically highlighting the landcover mapping capabilities (L11-12), but refrain to SDS.

Additionally, the manuscript mentions a new deep learning (DL) model, but this feature is not extensively covered. If the DL model is intended to be a key feature, it should be properly benchmarked using statistical performance indicators, so that users know when to choose a specific model. If the inclusion of this DL model is a separate effort, it may be better suited for a dedicated paper or feature release. At this stage, I would expect that the DL model, that currently does not take the infrared bands into account, is performing less in wave-dominated environments, because it will probably map the surf zone instead of the shoreline, but I cannot see that - some performance metrics would be very useful! Do you have some benchmark statistics on the Zoo model available? Then I would recommend including these into the paper!

Moreover, what about enhancing the hub with a standardized workflow to benchmark new models? Or maybe add that to the roadmap? Ideally, if you decide to include performance statistics on the DL model, I would recommend to develop a generic workflow (NB based), so that users can benchmark their SDS advancements. Would it, for example, be possible to make a simple connection point between the hub and the iconic coastal sites that were used in Vos et al 2023?

Literature

The current manuscript provides an overview of several satellite-derived shorelines (SDS) methods and applications, but it would be better to give a more comprehensive overview of historical work as several pioneering contributions have been overlooked. To engage with a broader audience and enhance the manuscript's depth, I suggest incorporating additional key references.

[x] Firstly, the manuscript could benefit from including the earliest known SDS references. Please verify and consider adding the earliest references on lines 13-14.

https://www.sciencedirect.com/science/article/pii/S0034425712001174?casa_token=cjDozJ4mn_MAAAAA:j1SB_wiVLoXGyT63NunDeZ2hPIaTL1bMpudTpPW_Ni_q9R3GUAKd68sOvss9iaWcASnFMxNw6RY

https://www.sciencedirect.com/science/article/pii/S002532271530086
[ ] Additionally, it is important to recognize a wider array of shoreline mapping efforts. The composite method for shoreline mapping is extensively discussed in Luijendijk 2018 and Hagenaars et al 2018. If the reference to Bishop-Taylor (2021) is there because that software is openly available, I would say that that is an additional feature that is worth mentioning separately.
[x] Finally, if the reference shoreline that you provide globally is from Sayre et al 2018, I guess you should add a reference (and maybe check the license).

I guess that by incorporating these references, the manuscript will not only provide a more thorough historical overview but also engage with a wider audience.

What is the difference between coastsat-package and CoastSat?

The coastsat-package on PyPi points to the CoastSat repository, but the description explains that its a slightly modified version. What are the differences? Where is this slightly-different version hosted? Who is going to maintain the coastsat-package? How are CoastSat community efforts cascaded into coastsat-package? What do we reference if we map shorelines using CoastSeg that wraps CoastSat underneath?

Description from the PyPi repository:

"""" This is coastsat-package the pip and conda package extension of CoastSat. CoastSat-package is a slightly modified version of coastsat to make it compatible with CoastSeg. """

Minor

L21-22: add a URL to GitHub repository?
L50: IMHO, I don't consider CoastSeg a platform. I would say it's a (very extensive) collection of interactive widgets that run Jupyter notebook, that together are an application. Maybe you could use (Desktop) Jupyter UI/App?
L177: doesn't is colloquial, maybe better "does not"
233-34: should be Normalized Difference Water Index

Hi @FlorisCalkoen

First of all thank you for reviewing CoastSeg. We appreciate the suggested improvements for CoastSeg and they have helped make CoastSeg a more robust piece of software. Thank you for working with us throughout this review process and for taking the time to understand the different design decisions behind CoastSeg.

While the manuscript presents the software as capable of landcover mapping, it primarily focuses on satellite-derived shorelines (SDS). The landcover mapping functionality appears to be less developed and undocumented in comparison to SDS capabilities. To manage expectations, I would recommend to clearly convey this to the reader, so by not specifically highlighting the landcover mapping capabilities (L11-12), but refrain to SDS.

In the zoo workflow, CoastSeg first performs landcover mapping followed by a pixel-wise segmentation of the geospatial image to extract the shoreline. Applying a landcover mapping model to identify additional features in the image, such as sand, vegetation, whitewater, and water, to then find the shoreline is a common practice in SDS. In the manuscript, we do not claim that CoastSeg does anything beyond SDS. We anticipate future contributions to continue to focus solely on SDS, however, a future optimal solution to SDS may involve landcover mapping as an intermediate step, like the zoo workflow within CoastSeg presently does. We believe this is clear, and leave the reference to landcover mapping as-is. It is technically correct, and encourages the community to continue to explore landcover mapping (i.e. dense pixelwise predictions on maps) as a necessary intermediate step in SDS.

Additionally, the manuscript mentions a new deep learning (DL) model, but this feature is not extensively covered. If the DL model is intended to be a key feature, it should be properly benchmarked using statistical performance indicators, so that users know when to choose a specific model. If the inclusion of this DL model is a separate effort, it may be better suited for a dedicated paper or feature release. At this stage, I would expect that the DL model, that currently does not take the infrared bands into account, is performing less in wave-dominated environments, because it will probably map the surf zone instead of the shoreline, but I cannot see that - some performance metrics would be very useful! Do you have some benchmark statistics on the Zoo model available? Then I would recommend including these into the paper!

The new deep learning model, used in the Zoo workflow mentioned in the manuscript, does not currently have benchmark statistics for its performance on coastal environments. We are working on a separate paper that will benchmark the models’ performance on a variety of coastal landscapes, including those used in the SDS benchmark. We plan to take a similar approach by co-author Kilian Vos, and create a separate repository and paper for benchmarking the deep learning models in CoastSeg. We have already provided preliminary validation statistics of the model with the published models (see the links we have made to the various models in the documentation). However, these only speak to the quality of image segmentation on a limited validation dataset, and not the quality of extracted SDS, hence the need for a separate paper. This is a software paper, that describes the CoastSeg project. JOSS papers are short and we don't have the space to include all the details you demand, but also these details are out of scope. Finally, our deep learning models do use NIR and SWIR. As documented, we have developed several models; one that uses RGB only, one that uses MNDWI, and one that uses NDWI. We look forward to sharing this work in peer-reviewed paper format shortly. In the meantime we have updated our website to state where to find the statistical performance indicators for each model, a description of the current spectral indices our models support, and an update on the status of our model benchmarking exercise.

Moreover, what about enhancing the hub with a standardized workflow to benchmark new models? Or maybe add that to the roadmap? Ideally, if you decide to include performance statistics on the DL model, I would recommend to develop a generic workflow (NB based), so that users can benchmark their SDS advancements. Would it, for example, be possible to make a simple connection point between the hub and the iconic coastal sites that were used in Vos et al 2023?

CoastSeg is primarily designed for SDS workflows, and we plan to keep any developments for benchmarking models within a separate repository, similar to how CoastSat has SDS Benchmark to track its benchmarking efforts. We have a guide for contributing models, and users will be able to benchmark their models in the separate repository. That being said, benchmarking models is outside the scope of CoastSeg’s intended functionality.

Firstly, the manuscript could benefit from including the earliest known SDS references. Please verify and consider adding the earliest references on lines 13-14.

Thank you for suggesting some additional references to include in the manuscript. Since this is a software paper, not a peer-reviewed research article, we believe our referencing is sufficient, including recent review papers that provide an overview of the state of the field. Since ours is an extension of open source SDS software, naturally we reference only those open source contributions.

Finally, if the reference shoreline that you provide globally is from Sayre et al 2018, I guess you should add a reference (and maybe check the license).

We added the reference for the Sayre et al. 2018 shorelines to the paper. The license linked here states that it is permissible for non-commercial use, which aligns with our intended usage.

What is the difference between coastsat-package and CoastSat? The coastsat-package on PyPi points to the CoastSat repository, but the description explains that its a slightly modified version. What are the differences? Where is this slightly-different version hosted? Who is going to maintain the coastsat-package? How are CoastSat community efforts cascaded into coastsat-package? What do we reference if we map shorelines using CoastSeg that wraps CoastSat underneath?

In the paper, we listed a number of improvements we have made to the original CoastSat workflow. This forms a whole section of the paper, in fact, so we're a little unsure of the confusion. Simply put, CoastSeg relies on the coastsat-package because CoastSat does not currently have an officially supported package available. The coastsat-package is a slightly modified version of CoastSat, with changes documented under the What’s Changed section on the package's ReadMe page (as well as listed in the paper at a higher level). These modifications include minor organizational adjustments, additional parameters for functions to give end users more control, and changes to the create jpg function to default to creating a jpg. The design of the coastsat-package ensures it remains closely aligned with CoastSat's workflow while providing added utility for CoastSeg.

The CoastSeg team presently maintains the coastsat-package, but since it is open-source, contributions from the broader community are welcome and encouraged. The package's description and homepage have been updated to link to the actual coastsat-package repository instead of directly to CoastSat. When changes are made to CoastSat, the coastsat-package maintainers updates the relevant code to maintain synchronization between the two. Consequently, any references to CoastSat using the coastsat-package should be considered references to CoastSat itself.

@2320sharon,

Thank you very much for your detailed answer! Although I'll leave it up to you and the editor @cheginit to decide, I would like to say that I still have some remaining concerns on a few points.

I still believe the manuscript would benefit from referencing key SDS studies from a broader scientific community, even though it is a software paper.
It is not ideal to release scientific software without benchmark statistics. While it is promising to hear that you are preparing a manuscript on this topic, relying on future publications is risky and it eventually might impact downstream science. I appreciate that you have linked to a Zenodo repository for the ML model. However, the performance metrics provided as numpy arrays may not be user-friendly for non-programmers or even experts, as these arrays do not have any labeling. It would be more accessible to present these metrics directly on the CoastSeg website or as text in the Zenodo repository.
Finally, the license that is used by Sayre et al 2018 states that: """ This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited, and is not altered, transformed, or built upon in any way. """

I'm not entirely sure how you use the coastline exactly, but probably you have to check that you do not break their clause. I read somwhere the NoDerivates implies that "If you remix, transform, or build upon the material, you may not distribute the modified material."
You are using a GNU License, which is more open and conflicts with with the license used by Sayre et al 2018.

Again, I'm not entirely sure how you use your coastline, but this is probably an action point to check. Maybe you can reconsider your use of the coastline? Or discuss this with Sayre et al 2018? Or adopt seperate licensing?

@FlorisCalkoen

Thank you for your detailed review of CoastSeg. I hope to address some of the points you raised here:

Regarding your suggestion about citing the additional paper, we have opted not to do so because the dataset used in that paper is not open source. There are many SDS papers going back more than 10 years and we do not have sufficient space for a comprehensive literature review, however we link to recent review and benchmark papers that better review the current state of the science.
- We will update the CoastSeg website with the validation statistics and metrics from the zoo. Additionally, we plan to provide even more detail as we continue our model validation exercises.
Thank you for raising the question about the license used by Sayre et al. (2018). In CoastSeg, we do not transform or alter the data from the coastline; instead, we simply segment the larger coastline into smaller chunks. Since we credit the original authors of the dataset and do not modify the coastline data, we believe we are adhering to the terms stated in the license. @cheginit what do you think about this?

Thank you again for reviewing CoastSeg and for all your help.

@FlorisCalkoen Thanks for providing a thorough review of the paper with constructive comments.

@2320sharon Thanks for working on addressing the issues raised by @FlorisCalkoen.

@2320sharon, in response to:

To manage expectations, I would recommend to clearly convey this to the reader, so by not specifically highlighting the landcover mapping capabilities (L11-12), but refrain to SDS.

you stated:

We believe this is clear, and leave the reference to landcover mapping as-is. It is technically correct, and encourages the community to continue to explore landcover mapping

But, after reading the paper, I agree with @FlorisCalkoen, this point is not clear to me as well. Please make the necessary changes to address this issue.

Regarding this point that you made:

We are working on a separate paper that will benchmark the models’ performance on a variety of coastal landscapes, including those used in the SDS benchmark. We plan to take a similar approach by co-author Kilian Vos, and create a separate repository and paper for benchmarking the deep learning models in CoastSeg.

Please add a few sentences to the paper (and documentation), explaining this point, so user are aware of the state of this part of the software before using it.

About the benchmarking issue, I agree with these points:

CoastSeg is primarily designed for SDS workflows, and we plan to keep any developments for benchmarking models within a separate repository, similar to how CoastSat has SDS Benchmark to track its benchmarking efforts. We have a guide for contributing models, and users will be able to benchmark their models in the separate repository. That being said, benchmarking models is outside the scope of CoastSeg’s intended functionality.

We will update the CoastSeg website with the validation statistics and metrics from the zoo. Additionally, we plan to provide even more detail as we continue our model validation exercises.

Please reflect this properly in the paper and documentation, so users know about the ongoing efforts.

On the issue of adding new citation and references, note that we have soft limit on the number of pages for the paper. So, please add the requested references, you don't have to provide a comprehensive literature review, but notable, important, and relevant previous work (regardless of their publication date) should be cited.

About this:

In the paper, we listed a number of improvements we have made to the original CoastSat workflow. This forms a whole section of the paper, in fact, so we're a little unsure of the confusion.

Please make the necessary changes to the paper and the documentation to clarify the differences and improvements, then ask @FlorisCalkoen if the changes address the confusion.

Regarding the license issue, I will get back to you. In the meanwhile, please also consult with other license experts, just to make sure we are seeking opinions of several people on this.

As a general note, whenever you make any changes to address a specific comment, whether in the paper or the software, it's important to clearly reference the changes that you made in the repo and ask @FlorisCalkoen if the change addresses the issues.

Hi @cheginit

Thank you for reviewing this issue and your input on the issue. I appreciate your guidance throughout this process. I will address each of the points raised and update the issue accordingly.

To begin:

To manage expectations, I would recommend to clearly convey this to the reader, so by not specifically highlighting the landcover mapping capabilities (L11-12), but refrain to SDS.

I will address this issue by removing the reference to landcover mapping capabilities from the paper.

Regarding the benchmarking mentioned:

We are working on a separate paper that will benchmark the models’ performance on a variety of coastal landscapes, including those used in the SDS benchmark. We plan to take a similar approach by co-author Kilian Vos, and create a separate repository and paper for benchmarking the deep learning models in CoastSeg.

I have already updated our documentation for the zoo workflow to state that:

These models have not been thoroughly tested yet, but we are currently undergoing the process of benchmarking these models in a variety of coastal environments. We will be documenting the results of this benchmark in a separate repository and we will link it here when its ready.

This can be found on the documentation website here . I'll also add a sentence to the Project Roadmap section of the paper that explains the models used in the deep learning based SDS workflow are actively being benchmarked at several locations.

Regarding the new citations and references:

On the issue of adding new citation and references, note that we have soft limit on the number of pages for the paper. So, please add the requested references, you don't have to provide a comprehensive literature review, but notable, important, and relevant previous work (regardless of their publication date) should be cited.

I'll add the requested references to the paper.

Regarding coastsat_package:

In the paper, we listed a number of improvements we have made to the original CoastSat workflow. This forms a whole section of the paper, in fact, so we're a little unsure of the confusion.

Please make the necessary changes to the paper and the documentation to clarify the differences and improvements, then ask @FlorisCalkoen if the changes address the confusion.

I'll make a page on the CoastSeg website that explains why we use coastsat_package instead of CoastSat and link the coastsat_package repository that lists the improvements & changes made to it. I will check with Floris Calkoen if this resolves the confusion.

As for the license issue:

Regarding the license issue, I will get back to you. In the meanwhile, please also consult with other license experts, just to make sure we are seeking opinions of several people on this.

Thank you for looking into the license issue. We are confident that we are not violating the terms of the license as we do not modify the data and provide multiple citations of the source. We will consult other experts to confirm this.

I'll be sure to tag @FlorisCalkoen to validate that the changes I make address the issues raised.

Thank you both for your time and for all your help in this review process.

Hi @cheginit I have looked into the license issue and believe I have a definite answer.

To recap, the dataset in question is https://pubs.usgs.gov/publication/70202401. This dataset is public domain, since it originates from the U.S. Geological Survey (USGS), a US government agency whose works are not subject to copyright. It technically is a USGS data product because the first author is a USGS employee https://www.sciencebase.gov/catalog/catalogParty/show?partyId=9062. All works by US government employees are public domain.

The reviewer said that the dataset has a "Creative Commons Attribution-NonCommercial-NoDerivatives License" and they link to https://creativecommons.org/licenses/by-nc-nd/4.0/

However, the reviewer is mistaken. He refers to the license of the paper, not the dataset. The license language reads:

""" This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited, and is not altered, transformed, or built upon in any way. """

This comes from the journal paper’s license: https://www.tandfonline.com/action/showCopyRight?scroll=top&doi=10.1080%2F1755876X.2018.1529714

Note that the license specifically refers to the article that describes the dataset, not the dataset itself. Note the license solely refers to the ‘Open access article’. I believe this settles the matter. Otherwise, I'd be happy to get the U.S. Geological Survey legal team involved to clarify the situation over licensing of datasets that originate from the US government.

@dbuscombe-usgs Thanks for looking into this and resolving the matter by providing valid points, appreciate it.

@2320sharon, @FlorisCalkoen Please consider the license issue resolved.

Hi @FlorisCalkoen

I have implemented the suggestions you provided and would appreciate your review of the updates. Here are the changes made:

Enhancements to CoastSat Package

I updated the coastsat_package README with a more thorough list of the enhancements we've made. Some of these enhancements were completed after we wrote the paper, which is why they were not mentioned originally. You can see the new readme here Additionally, I have updated the paper.pdf with a sentence describing some of the enhancements at a high level on lines 55-58. Please let me know if you think this provides sufficient detail.

Additional References

I've added a citation for the requested paper to line 15.

Benchmarking

I've also added the a sentence that informs the reader that we are currently benchmarking the models to the lines 232-234.

Minor Changes

All the minor changes have been updated and incorporated into the paper.

Thank you again for your time reviewing CoastSeg.

@FlorisCalkoen thank you for reviewing CoastSeg, since you said you finished your review can I close this issue?

SatelliteShorelines / CoastSeg