alanrenouf / vCheck-vSphere

vCheck Daily Report for vSphere
MIT License
732 stars 326 forks source link

The underlying connection was closed: Could not establish trust relationship for the SSL/TLS secure channel. #217

Open ajohn24 opened 10 years ago

ajohn24 commented 10 years ago

Hi, While running the vCheck script it fails while getting the hard disk info on plugin48. Seems the earlier plugin disconnects the session to the VC server. It runs fine till earlier plugins.

Get-HardDisk : 6/12/2014 9:33:00 PM Get-HardDisk Server vc.com not connected. At C:\scripts\Core\vCheck-vSphere-master\Plugins\48 Find VM Disk Format.ps1:6 char:35

Please assist

Sneddo commented 8 years ago

Therfore I am thinking that possibly releasing/clearing of variables is needed in between plugins.

Possibly... there are a lot of plugins that store a result in a variable, mostly to get the count in the plugin title. This can be achieved a better way with the [count] replacement. Probably worth going through and fixing them up at some point.

Not sure if it will be a fix, as it will likely be outweighed by the global info collection, but sure to help a little.

jtinouye commented 8 years ago

I also had this particular error and was able to alleviate it by running the following command in my powercli environment:

set-powercliconfiguration -weboperationtimeoutseconds -1

MitoTranin commented 8 years ago

I ran into this issue when I was running the vCheck against my larger vCenter servers, but ONLY when running it with a jobfile. If I ran the check natively (ie: without a job XML file) it would run successfully, but when I ran it with a job file, I would receive these errors. At first I did not care too much, but eventually I got to the point where I wrote a custom wrapper script to run vCheck against a handful of different servers sequentially, and thus needed the job parameter to be able to specify the different vCenter servers to connect to.

I followed jtinouye's suggestion, and it works great. I have not encountered the error since. I wasn't sure I wanted to make that change permanently or not for the entire utility server, so I just added this line to the wrapper script:

set-powercliconfiguration -weboperationtimeoutseconds -1 -scope Session -confirm:$false

For those interested, here is main logic portion of the wrapper script that allows it to run against multiple vCenter servers sequentially:

$Jobs = @()
# List all of the job file names that will need to be ran, in order
$Jobs += "Server1.xml"
$Jobs += "Server2.xml"
$Jobs += "Server3.xml"
$Jobs += "Server4.xml"

# Report Base Path
$ReportPath = "\Reports"

# Turn off timeout for web operations to alieviate issues with the SSL session being disconnected
# https://github.com/alanrenouf/vCheck-vSphere/issues/217
set-powercliconfiguration -weboperationtimeoutseconds -1 -scope Session -confirm:$false

$vCheckBasePath = (Resolve-Path .\).Path
$vCheckCMD = $vCheckBasePath + "\vCheck.ps1"

# Build output directory path
$DateYear = Get-Date -Format "yyyy"
$DateMonth = Get-Date -Format "MM"
$DateDay = Get-Date -Format "dd"

#   NOTE: Change the following line in vCheck.ps1 to ensure you do not double-organize your reports!
#OLD   $ArchiveFilePath = $Outputpath + "\Archives\" + $VIServer
#NEW   $ArchiveFilePath = $Outputpath

$OutputPath = $vCheckBasePath + $ReportPath + "\" + $DateYear + "\" + $DateMonth + "\" + $DateDay
if (-not (Test-Path -PathType Container $OutputPath)) { New-Item $OutputPath -type directory | Out-Null }

# Run vCheck Reports
foreach ($JobFile in $Jobs) {
    Invoke-Expression "& `"$vCheckCMD`" -job $JobFile -Outputpath $OutputPath"
}
rnelson0 commented 8 years ago

I am receiving the same error as described in the original post. It seems there are a number of potential fixes, but none are pinpointed as the actual fix. Has any progress been made on this recently, and if not, how can we help progress this?

meoso commented 8 years ago

@rnelson0 , i added jtinouye's recommendation into my vCheck.ps1 also. this is the edit i put directly before the Internationalization comment around line 70'ish.

### MY EDIT ############################################################
winrm set winrm/config/winrs '@{MaxMemoryPerShellMB="3072"}'
winrm set winrm/config/winrs '@{MaxProcessesPerShell="100"}'
set-powercliconfiguration -weboperationtimeoutseconds -1 -Confirm:$False
########################################################################

I also tried $WarningPreference = "SilentlyContinue" at some point, but commented it out in my final run, i don't recall if it helped or hurt, or neither.

If you haven't already, definitely look into running with custom .xml's as reducing the used plugins to only what you really want will help with unexpected errors.

rnelson0 commented 8 years ago

@meoso thanks, I have found Remove-vCheckPlugin and have used that to tweak things down, seems to have the net result of fixing it as well, but I will keep those settings in mind if it comes back.

One thing to note, I am running this locally. Would WinRM settings apply in that case, and if so, why?

rnelson0 commented 7 years ago

@meoso Hrm, I still run into the issue but FAR less. See https://gist.github.com/rnelson0/a1acf936d858adc3836c130418cbe8e6, don't want to flood the ticket with the error reports. Any suggestions for further tweaking, or are those causes perhaps separate?

Mothra13 commented 7 years ago

Just wanted to add a couple notes. I run a powercli script out of Jenkins using a win7 node to execute the modules. The script will loop through a collection of ~1k VMs looking at vievents. With no real pattern I would see the occasional 'Get-VIEvent Could not establish secure channel for SSL/TLS with authority'. It would not bomb out, continuing on just fine. It would happen every run to a very small number of VMs, but not the same ones.

For me bumping up memory from 16GB to 32GB on the node the runs this script seems to have put this to bed. Figured I would share what was a simple fix in the end.

maZuFC commented 7 years ago

im having this issue as well

Get-VDSwitch : 13/07/2017 10:07:13 Get-VDSwitch Could not establish trust relationship for the SSL/TLS secure channel with authority 'vcenter.domain.com'. At C:\vcheck\Plugins\60 VM\200 VMs on ephemeral portgroup.ps1:4 char:16

Get-NetworkAdapter : The input object cannot be bound to any parameters for the command either because the command does not take pipeline input or the input and its properties do not match any of the parameters that take pipeline input. At C:\vcheck\Plugins\60 VM\200 VMs on ephemeral portgroup.ps1:5 char:9

i have a lot of these entries pointing to SSL/TLS secure channel... could it be a dodgy certificate? just wondering if recreating the certificate would help....

maZuFC commented 7 years ago

definitely seems this is a intermittent problem like everybody else have not been able to identify a root cause of this .. ive spent a couple of days on this now and im getting fed up with it damage limitations for now...

for our environment ive had to disable the following plugins

79 Find VMs in Uncontrolled Snapshot Mode 106 Find Phantom Snapshots 108 SRM RPO Violations 202 VMs MMU Configuration

disabling these has made the script run a lot better but still has its faults at random times.. its not perfect but does seem to be more stable.

ill keep checking the thread and updates to see if this issue is resolved.

rnelson0 commented 7 years ago

In working with VMware Support, it was determined that the use of the bundled self-signed/untrusted certs was causing the issue. You can get the cert bundle from https://vcenter.example.com on the right hand side, bottom link. Once I added the CA to the Trusted CAs in the certificate store, I could NOT get this error to reproduce; removing the CA immediately brought it back. I'm pretty confident this is the fix.

To ensure I'm explaining it properly, I wrote a blog post which goes into greater detail on the findings and remediation. As emphasized there, I've only been testing it this way for a month, but I've had 0% errors across upwards of 30 manual runs in that time. I think this really is it.

Sneddo commented 7 years ago

That's really interesting! Hopefully some of the others can confirm this fix.

Does line up with my own observations though- I haven't seen it in my current environment which uses certs from our internal CA, but in previous roles I would occasionally see it...