TomasBeuzen / pybeach

A Python package for locating the dune toe on cross-shore beach profile transects.
MIT License
19 stars 12 forks source link

Discussion regarding profile classification for crest detection #10

Open benjaminh opened 4 years ago

benjaminh commented 4 years ago

Hello there,

my colleagues and I work for a coastal observatory and I wrote some Python code to process LIDAR data. Our method is similar to RR (relative relief) provided in your package yet quite different on some points. I found your paper and this repo back these days and this was quite a nice discovery as you provide some interesting thoughts using ML.

Though I did struggle a bit to make an editable install (not used to poetry) and to make use of my data (opposite profile direction, profile points numbering not starting at 0, failures on detecting a dune toe depending on the profile, etc.), the results are very interesting ! We are thinking of creating our own classifier based on our data but I'm unsure if we have enough training data at that moment (~300 profiles vs. ~1500 that you used to create your models).

Yet, if the results on the dune toe detection are satisfying so far, we still have trouble on detecting the correct crest, whether considering a geomorphology approach or a risk management approach, and depending on the profile typology.

For example, sometimes the highest z value goes too far on land side whereas the highest peek sometimes point to a tiny peek on the beach side.

So here is the main question: did you try to apply your ML method (using a random forest classifier) in order to detect the dune crest as well ? And in a similar way, we observe that the crest detection is sensitive to profile morphology (like reflective vs. dissipative one), did you try or do you know any approach that makes use of profile classification before detecting the crest/toe ?

JakubP23 commented 3 years ago

Hi there,

I saw that you were also working on calculating different parts of a coastal dune. I am currently helping work on a research project in which we are measuring the dunes using LiDAR data. I was wondering, how were you able to input your data into pybeach in order to get back results. It would be great if you could help me. Thanks

benjaminh commented 3 years ago

Loading data to be processed by pybeach was pretty straightforward in my case :

For a simple use case, I end up with something like that :

import numpy as np
import pandas as pd
from pybeach.beach import Profile

input_data_file = '/path/to/my_csv_file.csv'
profile = pd.read_csv(input_data_file, delimiter='#', decimal=',')

x = np.arange(len(profile))
z = profile['z'].to_numpy()[::-1] # I need to revert points since Pybeach expects profile to be seaward

p = Profile(x, z)

Hope it helps, good luck

JakubP23 commented 3 years ago

Hi, Thank you for the helpful advice and the code. I was also wondering about what you used to convert your .las files into a .csv file. Did you use python to do this or some other format?

pwernette commented 3 years ago

This discussion is interesting and brings up several good ideas about leveraging the ML approach with crest identification. I wonder if it makes a difference if metrics used in the ML model are computed along a profile or using a planform area.

As to @JakubP23 most recent comment, lastools has a utility to convert LAS/LAZ files to CSV (and other ASCII file formats) and it works as a command line utility. Otherwise, I've found the laspy package with Python very user-friendly and versatile. It looks like the laspy package also utilizes lastools utilities but wrapped up in a Python workspace, although I could be mistaken. Either way, converting LAS/LAZ to ASCII can be done via lastools (stand-alone utility) or the laspy package in Python.

JakubP23 commented 3 years ago

Hello once again. I managed to get a csv file in order to be able to use it in beach.py. I belive that I set the correct path for the.csv file and tried to run it as is. I received an error and haven't been able to figure out how to fix it. I was hoping you would maybe know what I am doing wrong. I will add the code + the terminal with the error. Thanks in advance.

`import numpy as np import pandas as pd from pybeach.beach import Profile

input_data_file = '/Users/kuba2001/Desktop/A1/LiDAR_Points.csv' profile = pd.read_csv(input_data_file, delimiter='#', decimal=',')

x = np.arange(len(profile)) z = profile['z'].to_numpy()[::-1] # I need to revert points since Pybeach expects profile to be seaward

p = Profile(x, z)`

Screen Shot 2021-08-05 at 1 46 04 PM

benjaminh commented 3 years ago

According to your screenshot and error message, I believe your csv file doesn't have the same structure as mine. Can you show an excerpt of your csv file? Do you have a column named 'z'?

JakubP23 commented 3 years ago

I saw the same thing and the z Column is named 'IBSP_TIFF'. I entered this too and received the same error just with the name IBSP_TIFF instead of z

Screen Shot 2021-08-05 at 2 07 42 PM

TomasBeuzen commented 3 years ago

@JakubP23 - as far as I can see, the error is not with pybeach but the way you're loading and wrangling your data. Specifically, the traceback is pointing out this line:

z = profile['z'].to_numpy()[::-1] # I need to revert points since Pybeach expects profile to be seaward

(By the way, it's much easier for others to read your code if you put it in a Markdown code block using 3 back ticks - see the Markdown docs here - for example, in the code block above I'm using ```python)

Anyways, based on the csv file you've provided, could you try the following:

input_data_file = '/Users/kuba2001/Desktop/A1/LiDAR_Points.csv'
profile = pd.read_csv(input_data_file, delimiter='#', decimal=',')
x = np.arange(len(profile))
z = profile['IBSP_TIFF'].to_numpy()[::-1] # I need to revert points since Pybeach expects profile to be seaward
p = Profile(x, z)

You said you tried this I think but it should work. If it doesn't work, could actually show what your loaded data frame, profile looks like? That would be very helpful. I'm particularly interested in why your delimiter is a pound #, I've never seen that!

benjaminh commented 3 years ago

@JakubP23 As @TomasBeuzen said, I guess you copied my code snippet with the delimiter I use (#), but your csv file needs to be structured as well. To confirm this issue, juste try to print profile.head() to get a summary (first 5 lines) of your dataframe and verify that it corresponds to your csv data.

@TomasBeuzen I make use of the # symbol as a delimiter of my csv files by habit, because I find it useful: it is easily identified when looking at the data, and rarely used so limiting errors (especially when working on text data) compared to ,, ; or tabs.

To get back to the original discussion and @pwernette comment: we did some experiments using RandomForest classifier to identify crests from a single profile based on some signal processing features. Our method differs slightly from what is done in Pybeach but I would be glad to exchange on this subject. I didn't try on a planform area though this is an interesting lead.

JakubP23 commented 3 years ago

@benjaminh how do I print the profile.head()? Do I add it into the pybeach code?

@TomasBeuzen I tried to run the code that you suggested with the IBSP_TIFF and it gave me a similar error.

Screen Shot 2021-08-09 at 3 11 31 PM

benjaminh commented 3 years ago

@JakubP23 Well, just add print(profile.head()). This is just some pandas basics, not related to pybeach, I suggest you to have a look at pandas documentation. But you are free to use any other mean of reading some external data source, I just provided a snippet corresponding to my habits with pandas.

Once again, your error must be due to the structure of your data file and the way you load it in Python, there is no relation with Pybeach. Just make sure you do use the same delimiter as declared when calling read_csv function. As I said before, just have a look at your csv file (using text editor) to confirm what delimiter you use.

JakubP23 commented 3 years ago

Hi again. All your advice has been helpful and I have been trying to debug the problem myself. simplified the CSV file in order to only show the three columns (x, y, z). After I did this I am still getting the same error. I was hoping you would be able to help me with this. Not the most familiar with CSV files. If you can I would send over the CSV file and see if it works for you. Below I have attached the error that I am getting with the updated CSV file.

Screen Shot 2021-08-11 at 3 37 20 PM

benjaminh commented 3 years ago

Same problem, same solution. Your delimiter in the csv file is clearly a comma , and not a #, so pandas can't interpret what your columns are and you get just one column named ,z,x,y. Just compare your output to what I told you in my first answer : https://github.com/TomasBeuzen/pybeach/issues/10#issuecomment-884008959. This was just an example, you can define whatever delimiter you want but just make sure the one declared in read_csv is the same as in your csv file.

Once again, just look at some documentation on how to read data in Python (my way with pandas and csv files is not the only way), there are tons of tutorials on the web. Just read your data, check it, then load it to Pybeach. By the way, for any generic coding issue like that, you should better ask for help on stackoverflow

TomasBeuzen commented 3 years ago

@JakubP23 - I appreciate you may be new to Python but as has been mentioned a few times in this thread, your issue is not with pybeach but some simple pandas wrangling. You need to learn how to use pandas, in particular the read_csv() function. The best piece of advice I can give is that you need to learn how to search documentation (here's the read_csv() function) and learn how to debug your issues on your own, for example, simply typing "KeyError: pandas" in google gives you plenty of results on how to fix your problem.

In saying that, based on your csv file above, I feel like this should work:

input_data_file = '/Users/kuba2001/Desktop/A1/LiDAR_Points.csv'
profile = pd.read_csv(input_data_file)
x = np.arange(len(profile))
z = profile['z'].to_numpy()[::-1] # I need to revert points since Pybeach expects profile to be seaward
p = Profile(x, z)

If that doesn't work, I encourage you to debug your pandas issue on your own. These issue threads are for issues with pybeach, not generic coding problems.

JakubP23 commented 3 years ago

Hello again. I spent some time on the csv file and I think I figured out why it wasn't working. I this part to run and got no errors back which is good. If you don't mind me asking another question. Once I have this code how do I add it to pybeach. I tried to follow you example of pybeach in the jupyter notebook but I see you used a pkl file. if there a way to just use this code instead of making it into a pkl file?

Once again, thank you for your help with this whole issue that I have been having. Hope its not to much troubles fo you. Thanks.

JakubP23 commented 3 years ago

@benjaminh hi, So i was able to get that piece of code for the CSV file to run without any errors. I was hoping you could tell me what your next steps were from here? How were you able to get results from pybeach and be able to get those charts that pybeach prints out. Thanks.

JakubP23 commented 3 years ago

Hello. Hope you are going well. I have been trying to get my data to work but using the CSV format code, it doesn't work for me. I tried pikling the CSV file to have a pkl file like you do for the test sights. I keep getting an error on the customer classifier. I do believe it is because of the formatting of my data. Since this was the issue I tried un-pickling your test data to see what it looks like. I would not unpikle because I get an error (ValueError: All arrays must be of the same length). Would you be able to point me in the right direction of what the formatting must look like? Thank you!

benjaminh commented 3 years ago

Could you please at least open a different thread as your question is not related to the initial discussion ? Thanks. Also, please describe your problem in a different place like stackoverflow as your problem seems related to your code and not Pybeach.

TomasBeuzen commented 3 years ago

Agreed - @JakubP23 please create a new issue if you have a problem with pybeach specifically. Else stackoverflow is where you should go for general coding issues. I think the example notebook or even the usage section, regardless of your data is stored and loaded, it's your job to get an x and z in the following format:

        x : ndarray
            Array of cross-shore locations of size (m,).
        z : ndarray
            Array of elevations matching x. May be of size (m,) or (m,n).

It's not pybeach's job to do this for you, you need to do it based on how your data stored and loaded.

Now, answering some questions in the original thread so we can potentially get some closure here...

Did you try to apply your ML method (using a random forest classifier) in order to detect the dune crest as well?

I did do this, experimentally, after creating pybeach and it worked well. As with the ML dune toe detection, I needed to have manually labelled crests to train the model, but after doing so, the model worked well (I had a quick look at my notes and it was about 80-90% accuracy). Of course, this depends on how well I identified the crests for detection in the first place. With pybeach, we had multiple people identify dune toe locations before training the model. When I did the crest work, it was just me labelling crests - admittedly they're easier to pick, but it needs to be a more formal process. I don't have code or data to share for doing this at this time, but wanted to flag here that it does seem to be possible.

And in a similar way, we observe that the crest detection is sensitive to profile morphology (like reflective vs. dissipative one), did you try or do you know any approach that makes use of profile classification before detecting the crest/toe ?

I didn't do this but it's a great idea. Presumably you could have a meta-model that classifies the beach type before then applying a morphology-specific model. The obvious challenge is getting all the data needed to train the models! But tools like coastsat are making that easier.