adjtomo / seisflows

An automated workflow tool for full waveform inversion and adjoint tomography
http://seisflows.readthedocs.org
BSD 2-Clause "Simplified" License
172 stars 122 forks source link

Issue with adjoint in the Inversion Workflow #210

Open scott314159 opened 2 months ago

scott314159 commented 2 months ago

I am trying to build off of seisflows example 2 by creating a more exploration (higher frequency 1-10Hz, smaller model ) workflow. I built two segy elastic velocity files (INIT and TRUE) and created the initial data files in OUTPUT_FILES_TRUE and OUTPUT_FILES_INIT. All looks good at this phase. Forward modeling produces realistic seismograms (AA.S*.BXY files as well as jpeg snapshots). After setup I run "seisflows submit".

The difference between my parameter.yaml file and example 2 is the directory names (pathmodel*), the min/max period has been changed to reflect the frequencies of interest, and the smoothing length is a bit shorter.

The difference between example 2 and my run in the specfem/DATA/Par_file are a bit more substantial. Most of the differences are in the source/station geometry and file locations but as far as I can see. A geophysical parameters that is different is T0 =0 vs. 48 since I am at the surface and the model definition (coming in from segy using sgy2tomo.py from the Marmousi2 specfem2d example vs. using the internal par file definition). I also changed some of the sampling and listening times to conform with the nature of the experiment (short offsets, shorter listening time, higher frequency smaller sampling rate). Since wavefield snapshots show up on the jpegs along with the stations locations I assume I did not mess this up too much.

The initial and true models are different. I plotted them out and the waveforms in the syn and obs directories are different.

When I execute the workflow I get an error at the end that says the gradient is zero. Backtracking I see scratch/solver/00[1-4]/adj_solver.log the source min/max is zero. I also see that scratch/solver/00[1-4]/traces/adj/AA.S.BXY traces are zero but I also have files AA.S.HXY that don't appear in the example 2 directory and have non-zero seismograms. Maybe due to the frequencies that I am trying to work in that the channel code has changed and (maybe?) isn't being picked up by seisflows?

Additionally, working in my specfem2d working directory I executed --> bin/xadj_seismogram 0 4 AA.S0006 2
which outputs AA.S0006.BXY.adj which has a legitimate looking waveform. So I think that I've set up a model that is legitimate.

Can anyone provide suggestions? Sorry if this is an obvious one. This is my first try taking an example and making a modification like this. Thanks

bch0w commented 2 months ago

Hi @scott314159, awesome to hear you are building off the examples! That was the original intent for them so I'm happy to see them being used as a jumping off point.

Do you mind sharing your YAML parameter file? If the adjoint sources are showing zeros, it seems to suggest that the preprocessing is messing up or the obs and syn waveforms are the same for some reason. I think you may be on to something with the channel code too. Hoping the parameter file can shed a bit more light on this.

Also, each preprocessing task should create its own log file in logs/*, might be worth digging into those files to see if any error messages popped up that help us diagnose.

FYI I am in conference prep mode and will likely not get a lot of time to look at this over the next week, but it's high on the to-do list and hopefully we can get you up and running soon!

scott314159 commented 2 months ago

@bch0w Thanks for following up on this. Going through your comments: 1) scratch/solver/00[1-4]traces/syn and obs/AA.S00XX.BXY.semd are different 2) Adding the what I told you about the channel codes: I have ll .BXY.adj files (as expected). All 11 files have zero for the ampliutdess. I only have 10 .HXY.adj files. 3 files have zero amplitudes and they are in the middle of the receiver spread. 3) All of the log files in logs// have a zero length (they are empty). I redirect the screen output and also looked at the adj_.log, fwd*.log files. I did not see an error or anything like that. It was only the adjoint log that has entries that seemed odd (like the min/max source amplitude =0). However, I am new to this so it is quite possible that I missed something! 4) No worries about this taking a little while. I know preparing for conferences is a lot of work! 5) Attached is my parameter.yaml file. (note that I renamed it with a txt extension because yaml files are not supported. parameters.yaml.txt

scott314159 commented 2 months ago

@bch0w Thanks for following up on this. Going through your comments: 1) scratch/solver/00[1-4]traces/syn and obs/AA.S00XX.BXY.semd are different 2) Adding the what I told you about the channel codes: I have ll .BXY.adj files (as expected). All 11 files have zero for the ampliutdess. I only have 10 .HXY.adj files. 3 files have zero amplitudes and they are in the middle of the receiver spread. 3) All of the log files in logs// have a zero length (they are empty). I redirect the screen output and also looked at the adj_.log, fwd*.log files. I did not see an error or anything like that. It was only the adjoint log that has entries that seemed odd (like the min/max source amplitude =0). However, I am new to this so it is quite possible that I missed something! 4) No worries about this taking a little while. I know preparing for conferences is a lot of work! 5) Attached is my parameter.yaml file. (note that I renamed it with a txt extension because yaml files are not supported. parameters.yaml.txt

trinitite271 commented 2 months ago

I had exactly the same problem, and I also used the Marmousi2 example in specfem2d/Example/Marmousi2 to create my own example and repalce the model with another model which is 180m distance and 60m depth. I use seisflows with master branches. The true model and init model Looks seems correct. But the misfit are zeros and the observed and predicted shot gather are same (in "scratch", "solver", "002", 'traces', 'syn' and "scratch", "solver", "002", 'traces', 'obs'). The full example I ran was upload in https://github.com/trinitite271/seisflows1

bch0w commented 1 month ago

Hi @scott314159, sorry for taking so long to get back to you on this topic! Just to sum up the issue here since it's been a while since I looked at this:

  1. Synthetic inversion with INIT and TRUE models, verified different
  2. 'Observed' data created with option generate_data
  3. Syn and obs waveforms verified different and SPECFEM's xadj_seismogram creates non-zero adjoint source
  4. SeisFlows producing zero-amplitude adjoint sources

I think this might be related to your observation that "due to the frequencies that I am trying to work in that the channel code has changed and (maybe?) isn't being picked up by seisflows".

SPECFEM will change the channel code based on the sampling rate of the synthetic data, and potentially naming mismatches are causing issues with the preprocessing.

Are there any logs in scratch/preprocess/logs/*. The files in there (if there are any) should let us know what is going wrong with preprocessing/adjoint source creation.

bch0w commented 1 month ago

Hi @trinitite271, thanks for adding in your example here, seems like your issue may be slightly different as you are using the default preprocessing module and the master branch. If you are still facing issues would you mind opening a new issue so we can address it separately? Thanks!

scott314159 commented 1 month ago

@bch0w thanks for getting back to this. Yes, there are log files in scratch/preprocess/logs. In fact ,there are 4 files, one for each source. Attached is one of the files.
002_i01s00.log

bch0w commented 2 weeks ago

Hey @scott314159, thanks for that and for your continued patience. Looking at the log file it seems like much of the data and synthetics have a cross correlation of 1 which would confirm that they are the same, although interestingly I see atleast one measurement where there is some difference waveform:

2024-04-19 12:24:44 [pyatoa DEBU] | Y_0: cc=0.97 / dt=0.00s / dlnA=-0.09

I'm wondering if the data and synthetic are just actually similar enough that with the preprocessing steps preceding adjoint source generation, that they end up looking the same?

I guess I haven't actually seen your model or waveforms yet so I can't say anything definiteively. Could you please provide the data and synthetic waveform files for one station from your problem for me to look at? Thanks!

And sorry again for the delay in response, I'll try to be quicker to get back to you in the future!

bch0w commented 2 weeks ago

Please see responses in this #219, I think they may be relevant here: (https://github.com/adjtomo/seisflows/issues/219#issuecomment-2181489307, https://github.com/adjtomo/seisflows/issues/219#issuecomment-2181506150)

scottm314 commented 1 week ago

@bch0w 1) I tried the solution #219 suggestion and it did not help. I got the same error as last time. 2) The correlation coefficient observation is a good one but, unfortunately, I chose poorly with the log file I suppled. I ran a grep on all of the log files and I have a range of cc from 0.86 to 1. The reason for this is that I kept the true and init and true background velocities the same but I changed the velocity (P and S) of an anomaly in the center of the model. So, I expect some correlations to be 1 because there is no change in the init and true models 3) below are a couple of traces from scratch/solver/003/traces. There is a difference between the waveforms. Not too large so I don't cycle skip. Also, the HXY.adj file has non-zero values but the BXY.adj files is all zeros.
AA.S0004.BXY.adj.txt AA.S0004.HXY.adj.txt AA.S0004.BXY.semd_obs.txt AA.S0004.BXY.semd_syn.txt

Is this what you are after?

Thanks again for your help.

bch0w commented 1 day ago

Hi @scottm314, okay! The search continues for the elusive bug!

I am now feeling like this has to do with the different sampling rates, as you have both BXY and HXY files. From your previous log file (002_i01s00.log) your synthetics are output from SPECFEM as BXY, but the adjoint sources are saved as HXY.

////////////////////////////////////////////////////////////////////////////////
                                 AA.S0002..BXY                                  
////////////////////////////////////////////////////////////////////////////////
...
2024-04-19 12:24:44 [pyatoa INFO] | saving adjoint source to: i01/s00/AA_S0002_HXY

SeisFlows will generate zero-valued adjoint sources for all missing components, which might explain why your HXY.adj files have signal, but the BXY.adj are zero, as it "thinks" there are no adjoint sources since they are named different than expected (HXY instead of BXY).

Is it possible to find a common value of DT in your SPECFEM Par_file for the starting and target models (if they are not already)? I'm wondering if the fact that you are seeing both BXY and HXY suggests that one or the other has a much different sampling rate and it's leading to some naming mismatches.

Hope that makes sense, let me know if you need clarification!